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ON MIXED SINGLE SAMPLE EXPERIMENTS! 


By LeonarpD CoHEN 
College of the City of New York 


1. Introduction and summary. William Kruskal |1], Howard Raiffa [2], 
J. L. Hodges, Jr. and bk. L. Lehmann [4], have shown that in certain Neyman- 
Pearson type problems of testing a simple hypothesis against a simple alterna- 
tive, determining the sample size by means of a chance device yields improve- 
ments over fixed sample size procedures. The purpose of this paper is not only 
to investigate the general problem of randomizing over fixed sample size tests 
of a simple hypothesis against a simple alternative, but also randomizing over 
other fixed sample size procedures in topics such as confidence interval estima- 
tion, the k-decision problem, etc. 

In Section 2, a fixed sample size test of a simple hypothesis against a simple 
alternative is identified with an operating characteristic (a, 8, n) where a de- 
notes the probability of a type I error, 8 denotes the probability of a type I] 
error, and n denotes the sample size. A mixed single sample test is defined as a 
sequence of quadruples. 

(yi, a; , Bi, ny), Where y; = 0, >°7 17: 1, where (a@; , 8; , 7:) isa fixed sample 
size test and where y, is interpreted as the probability of using the fixed sample 
size test (a; , 6; , n,) for 2 1, 2, ---.A mixed single sample test is identified 
with an operating characteristic (a, 8, n) > 1yi(a,, 8, n;). For each non- 
negative integer n, the class A, of admissible fixed sample size procedures of 
sample size n is defined in an obvious way. We define A = Uf 29 A, and A* as the 
convex hull of A. It is not necessarily true that A* is closed. An example is given 
to show this. However, it is true that the lower boundary of A* is a subset of A* 
so that the lower boundary of A* determines a minimally complete class, @, of 
mixed single sample tests. The tests in @ are characterized from a Bayes point 
of view and a technique for constructing the tests in @ is given. 

In Section 3, the technique is applied to tests on the mean of a normal dis- 
tribution with known variance. It is shown that the tests in @ are either 

(a) fixed sample size tests, or 

(b) mixtures of at most two fixed sample size tests. 

It is shown that there exists a minimal subset @» of A such that all improved 
randomized procedures are of the form (a, 8, n) y(0, 1,0) + (1 — y) 
(ao , Bo, No) Or (a, B, n) y(1, 0,0) + (1 — y)(a0, Bo, no), Where 0 < vy < 1 
and where (ae , By , mo) € Ge. It is then shown how to construct @o. The following 
problems (of the Neyman-Pearson type) are solved: 

(a) Given a and 8, how can we find the test in @ with the given a and 8? 
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(b) Given a and n, how can we find the test in @ with the given a and n? 
Numerical examples are worked out. 

In Section 4, the technique is applied to tests on the mean of a binomial dis- 
tribution. Although no general results were obtained, numerical examples of 
interest are given. 

In Section 5, the technique is applied to tests on the range of a rectangular 
distribution (when one end point is known). It is shown that if a > 0, > 0, 
and (a, 8,n) ¢ A, , then (a, 8, n) ¢ @. The tests in @ are characterized by a simple 
equation which makes it easy to 

(a) determine whether a given point (a, 8, n) belongs to @, and 

(b) construct any test in @, given two of the three coordinates. 

It is shown that if (a, 8, n) ¢ A, , then there exists a test (a, 8, n’) in @ such that 
n’ = (1 — a)n. Hence, the fractional saving in the expected sample size achieved 
by randomization is equal to a. 

In Section 6, it is shown that in tests on the mean of a rectangular distribution 
(with known range), it never pays to randomize. 

In Section 7, confidence intervais are evaluated in terms of confidence co- 
efficient (a), expected length (1) and expected sample size (n). For the problem 
of obtaining a confidence interval for the mean of a normal distribution with 
known variance, “improved”? randomized procedures exist and are of the form 
(a, L,n) = y(0,0,0) + (1 — y) (a’, L’, n’) where 0 < y < 1 and where (a’, 
L’, n’) is a fixed sample size confidence interval procedure. Clearly, the random- 
ized procedures obtained are of such a nature that the question of confidence in- 
tervals evaluated in terms of expected length and/or expected sample size is 
thrown open to discussion. 

In Section 8, the k-decision problem is discussed. It is shown that improvements 
can be obtained by randomization. 


In Section 9, the problem of applying mixed single sample tests of a composite 


hypothesis against a composite alternative is discussed. 

In Section 10, mixed single sample procedures are compared to Wald’s se- 
quential probability ratio test in the problem of tests on the range of a rectangular 
distribution when one endpoint is known and are shown to be efficient in a certain 
sense. 

In Section 11, the estimation problem is mentioned. It is shown that in most 
practical problems, fixed sample size procedures are optimal. 

In Section 12, applications of mixed single sample tests are discussed. 


2. Testing a simple hypothesis against a simple alternative. Let Y denote a 
random variable with density function (or discrete probability function) f(x, @). 
We wish to test the hypothesis Ho: = 6) against the alternative H,:4 = 6. 
In the sequel, we shall restrict ourselves exclusively to fixed sample size tests, 
both randomized and non-randomized, and mixtures of such tests. Any test of 
the preceding kinds will be identified with an operating characteristic (a, 8, n), 
where a denotes the probability of a type I error, 8 denotes the probability of a 
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type II error, and n denotes the expected number of observations. If two tests 
have the same operating characteristic, they will be considered equivalent. 

Let (4; ,22,-°*+ ,2,) denote a sample of n independent observations on X. Let 
6, denote a real-valued, measurable function of n variables whose range is the 
closed interval (0, 1). The expression 6,(27; , 12, --- , 2n) is interpreted as the 
probability of rejecting Hp» if (a1, r2,--- , 2.) is observed. Let A, denote the 
class of functions {6,| of the preceding type. 

Definition 1. For any integer n > 0, let S, = {(a, 8, n)i:a = Eb, | On), B 
E(1 — 6, | 0), 6, € An}. S, is the class of tests of fixed sample size n. We define 
Si '(a,B,0):0 Sa Slat B= ll}. 


Definition 2. For any integer n = 0, let A, \(a, B, n):a)(a, B,n) e S, , and b) 


there exists no other test (a’, 8’, n) belonging to S, with the property that a’ S a, 


8’ S B, at least one of these inequalities being strict. | 
The set A, is the class of admissible procedure based on samples of fixed size n, 
and is known to be complete. See Fig. 1. 

Definition 3. Let A = UF Ay. 

Definition 4. Let A* = {(a, B, n):(a, B, n) >: ov:(a; , 8;, n;) where y,; 2 0, 
> m0 ¥ = l, and (a;, 8:,2;:) ¢ A for: = 0, 1,2, --- }. 
y; is interpreted as the probability of selecting the fixed sample size test 
(a, ,8;, n;) fort = 0, 1, 2, --- . A* is the convex hull of A. 

Definition 5. Let @ = }(a, B, n):a)(a@, 8B, n) € A*, and b) there exists no other 
test (a’, 6’, n’) belonging to A* with the property that a’ S a, 8’ S B,n’ Sn, 
at least one of these inequalities being strict.}. 

The set @ is the class of admissible mixed single sample tests. 

We next wish to show that @ is complete, i.e., for any test (a’, 8’, n’) not in @, 
there exists a test (a, 8, n) in @ such that a S a’, 8B S B’,n Sn’, at least one of 
these inequalities being strict. If, in general, A* were closed, it would follow that 
@ is complete. However, A* is not necessarily closed, as the following example 
will illustrate. 
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Example. Let f(x, 6) 1 wf a@~sX S6+1 
0 elsewhere. 


We wish to test the hypothesis Ho:@ = 0 against the alternative H,:4 
where 0 < # < 1. A simple calculation shows that 


(1) A, \(a,B,n)0Sas (1 —4)",a4+8 (1 — 6,)"} 


We define a sequence {{a:. , 8 , m)}, Where (ax , 8. ,me) = Cl — 1/k)(0, 1, 
(1/k)(0, (1 — 6)", kk). Clearly lim,....(ax , By , ne) = (0, 1, 1). However, (0, 1, 1) 2 
A*. To prove this, assume (0, 1, 1) ¢ A*. Then, since A* is a three-dimensional 
convex set, (0, !, 1) can be expressed as a convex linear combination of at most 
four points in A, i.e., (0, 1, 1) = >: 1vi(a;, Bi, Ni), Where y; = 0, >-' Ly | 
and (a; , 8; ,n;) ¢ A for? = 1, 2, 3, 4.Since >>} 1¥i8; = 1, it follows that 8 l 
if y; > 0. However, if 8; = 1, it follows from (1) that n; = 0, contradicting the 
assumption >-! iyn; = 1.Q.E.D. 

In order to show that @ is complete, we define Af = (a, 8, n): (a) (a, 8, n) 
is a boundary point of A*, and (b) there exists no test (a@’, 6’, n’) belonging to 
A* such that a’ S a, 8’ S Bn’ Sn, at least one of these inequalities being 
strict.}. 

The set A¥ is the “lower”? boundary of A*. Clearly, @ C A. We shall now 
prove an important theorem. 

THEOREM 1. A¥ C A*. 

PRroor. Suppose (a, 8, n) € Aj. Then, since (a, 8, n) isa boundary point of A*, 
there exists a sequence of points {(a, , 8; , m)} belonging to A* such that 


(a, B, n) limye(ax , Be, Mk). 


Since A* is a three dimensional convex set, each point (a, , 6 , nm.) of this se- 
quence can be expressed as a convex linear combination of at most four points in 
A: i.e., foreach k, there exist numbers yix , ax, Bix , Nu Such that (ax, Be, Ne) = 


1 4 
Zz 1 Vulanx, Bi, Nx), Where yx = 0, viet vx = land (ax, Bu, ni) eA 
for 7 = 1, 2, 3, 4. Without any loss of generality, we can assume that the se- 
quences {yx}, fax} and {8%} are convergent for 7 = 1, 2, 3, 4 as k tends to 


infinity. Let y; limo Ye, @ = limeean, Bi = lime Bu for? .. 2 
3, 4. Clearly, 7; 2 0, >-' 1¥:i = 1 fori = 1, 2, 3, 4. Before proceeding with 
the proof of Theorem 1, we prove a useful lemma. 

Lemma 1. Jf y; > 0, there exists a number N; such that ny. < N; for all k. 

PRooF. Since lim,... n, = n, there existsa positive integer K such that if k > K, 
nm. <n-+ 1. Furthermore, since limy..yx = yi > 0, there exists a number K, 
such that ifk > K; ,yu > dy; . Let M; = max(K,K,). Then, ifk > M; , }ynmu S 
7 ryuNnk = mu < n+ 1. Thus, ifk > M;,nu <2(n + 1)/y;. Then, N; 
max[niy, Mio, ++ Mins, 2(n + 1)/y,] is the required number, proving the lemma. 

We now proceed with the proof of Theorem 1. Consider four cases. 

Case 1. 7: > 0, p= LZ 2 4. 


Let N = max(N,, No, Nz, Ns) where N; is defined in Lemma 1. Then, since 
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0 Ss nx S N, fori 1, 2,3, 4 and all k, the sequences {ny} are bounded. Hence, 
for each 7, there exists a convergent subsequence which we denote by {7 


‘ { 
' 


-}, {Bu} and {7} denote respectively the subsequences of fax}, {8 
) 7 1 


ly«{ corresponding to the convergent subsequence {fix{ of {nm}. Let 
n,. Clearly, 


4 4 
lim (ax, Be, m) = lim > yulan, Bi, na) = dD lim yula, 
lex 1 


koa lkeox 


4 4 
» lim FVin(@x, Bi, Ni) = > Vilas, Bi, Ni) 
Lica il 


Since (@,., Bx, Au) € A for all i and k, and since A is closed, 


lim(@ix , Bic, i) = (ai, Bi, ni) € A 
~ ‘ » 4 
furthermore, sincey,; > 0, 7 a oe and >—! 17: 
A*. Hence (a, 8, n) ¢ A* 
The more difficult case to prove is Case 2. 
Case 2. Exactly one of the y,’s is 0. 
To fix ideas, suppose y; 0, ¥2 > 0, ¥3 > 0, y¥4 > O. Let N max(A 
N,). In a manner analogous to that used in Case 1, 


nu} and {y,.} fori 2,3, 4. We define ne w sequences 


4 
ay = Ful(O) + > Fu Qik , 


N;, 


we define sequences |&«}, 


4 
i. = Fu(1) 4- >, Fa Bu 9 


4 
ne = Fu(O) + D> Fie he, 
where 7; yi. . It is easily seen that 


° , 
lim a, = a, 
hon 


lim 8; 


Kon 


lim mm S n, 


hox 


Since (ai , 3% , nk) € A* foreach k, and since (a, 8,n) ¢ AT , it follows that the inequal- 


, 


ity limo me <n cannot hold. Hence, 


4 
. , , , ‘ a - 
(a, B, n) = him (a, , Bj ,m,) = lim 7 ik (a > Bik » Nik) 


kon keonw 


4 4 
. > lim F(a , Bix » Tis) = 4 la; , Bs , 5). 


Using the argument in Case 1, we find that (a, 6, n) ¢ A* 
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Case 3. Two of the y,’s are 0. The proof of Case 3 is analogous to the proof 
of Case 2. 

Case 4. Three of the y,’s are 0. The proof of Case 4 is analogous to the proof of 
Case 2. 

Corotiary 1. @ = AP. 

CorROLLARY 2. @ is complete. 

Proor. Let (a’, 6’, n’) be a test which does not belong to @. 

Let 


Ava's’) = {(a, B, n):a) a = a’, B = B’ and (a, B, n) € A*}. 
A a’,3’) IS non-empty since (a’, B’, n’) € Aca’,s’). Let 


; ; — oe 
N= News) = int n 


tn':(a',B'n') © Ala 
Then (a’, 8’, N) € @ = A? where N < n’. 

Note: It is possible to show that @ is complete using a different approach. If 
we define S = U*_)S; and S* as the convex hull of S, it can be shown that S* 
is closed. This implies that @ is complete. However, to prove that S* is closed 
requires a technique similar to that used in proving Theorem 1. 

THEOREM 2. [f f(x, 00.) = 0 af and only if f(x, 0:) = 0, then a necessary and suf- 
ficient condition for (a, B, n) to belong to @ ts that for some non-negative a and b and 
positive c, we have 


aa + bB + en = min faa’ + bp’ + cn’}. 
(a’,B',n' de 4* 

Proor. To prove the sufficiency of the condition, we consider 4 cases. 

Case 1.a = 0,6 = 0,¢ > 0. Then aa’ + bp’ + en’ = en’ is minimized only 
by tests (a, 8, 0) belonging to Ay. However, Ao C @, proving the sufficiency of 
the condition if Case 1 holds. 

Case 2.a = 0,b > 0,c > 0. Then, aa’ + bp’ + cn’ = bp’ + en’ is minimized 
only by the test (1, 0,0) which belongs to Ao. 

Case 3.a > 0, b = 0,c > O. (Similar to Case 2.) 

Case 4.a > 0,b > 0,¢ > 0. Then, it is well known, and can be easily proved 
that any test (a, 8, n) such thataa + bB + en = MiN(a’,g’,n’) ¢ ae(aa’ + bB’ + en’) 
belongs to @. 

To prove the necessity of the condition, we assume (a, 8, n) € @. 

(i) If n = 0, choose a = 0,6 = 0,c = 1. 


(11) If n > O, then it is well known in the theory of convex sets that there exist 
non-negative numbers a, b and ¢ such that 


aa + bB + en = min (aa’ + bp’ + en’). 


(a’,B’,n')e A® 


It remains to show that c > 0. Assume c = 0. Then 


aa + bB min (aa’ -++ bp’) = 0. 


(a’,B’,n’)eAa® 
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Since (a, B, n) ¢ @, then there exist numbers 7; , a; , 8; , ni such that (a, 8, n) = 
Din vila; , Bi, ni), Where y; = 0, Die yi =1, (a;,8:,2:) eAfort = 1, 2, 3, 4. 
Thus, 


4 4 
aa + bB =a } yiai +b >. 7: 3; = 0. 

t= i=l 
Since both a and b cannot equal 0, either a = 0 or 8 = 0. Assume a = 0. Then, 
ify, > 0, a, = 0. Using the fact that f(z, 6) = 0 if and only if f(z, 6,) = 0, it 
follows that if a; = 0, 8B; = 1. Hence, (a, 8B, n) = (0, 1, n). But, (0, 1,7) 2@ 
since (0, 1, 0) is preferred. Thus we are led to a contradiction of the fact that 
(a, 8, n) e @. If we assume 8 = O, we are led to a similar contradiction. There- 
fore, the assumption c = 0 is false. Theorem 2 is thus proved. 

Theorem 2 states, in effect, that the problem of generating @ reduces to con- 
structing tests (a, 8, n) which minimize the expression aa + b8 + en for all 
choices of non-negative a and b and positive c. The cases where either a or b is 
0 were discussed and disposed of in proving Theorem 2. The main problem, then, 
is to construct the tests (a, 8, n) which minimize the expression aa + bB8 + en. 
We proceed as follows: without any loss of generality we may assume that 
a+b = 1 and write a = rand b = 1 — a, where 0 < zw < 1. Then, we wish 
to find the tests (a, 8, n) in @ such that 

ra + (1 —w)8B+en = , ain [ra’ + (1 — w)8’ + cn’. 
(a Bn )e A® 
Clearly, 
min [ra’ + (1 — w)8’ + cn’) 
ae 


ee) we 
= min 
Cy 6.048 .m 6:76 2O.DKo reel (ag Bim) € A] 
ien0,1,2,--- 


f \ 


4 Dia + (1 -* Dark eeLun 


==() =() 
’ ‘ } 


min 
Lye Siomesyi ZODKo veel (ay Bins) eA) 
t=0,1,2,--- 


\ 


> vdeas + (1 — 8) +e DO vin 


( 
“< 
\ =0 i= 


N20 


= min (cv +. + min 


C(yiemicDo vine, 7, 20,2 7, =1,t—0,1,2,---) 


a 


) 
. >» vi min [ra; + (1 — r)l>). 


i= (a, By nye An, 


It should be noted that the operation ‘‘min’’y2o is not restricted to integral 
values of NV. 
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From the above, it is clear that the desired minimization can be accomplished 
in 3 steps, which we shall now describe in detail. 

Step 1. We can, for each n; , find the tests (@; , 8; , n;) belonging to A,, which 
minimize the expression ma; + (1 — 7)8;. For each n; , let 

R,(n;) = min ‘ra; + (1 — w)Q,;}. 
(a;,8; @; Bs,n;)eAas) 

R.(n;) may be interpreted as the Bayes risk for fixed sample-size procedures of 
sample size n; where z is the a priori probability that 6) is the true parameter 
and 1 — 7 is the a priori probability that 6; is the true parameter. 

In particular, 


R.(0) = min loa + (1 — r)8} = min (x, 1 — 72). 
[ a,8:0< a<1,a+8=1] 
If0 <a < 3, R,(0) = rm. The only test (a, 8, 0) belonging to Ao satisfying the 
equation ma + (1 — )8 = wis the test (1,0, 0). Similarly, if } << 1, R,(O) 
1 — zx. The only test belonging to Ao satisfying the equation ma + (1 — 7)B 
1 — wis the test (0 1,0). If x 3, R.(O) = 3. Then, any test belonging to A 
ly + 18 = 3, since a+s8= 1. 


satisfies the equation 
We note that 


min [ra’ + (1 — w)8’ + en’) 


— min4cN + min > yi R,(n,)> 


N>0 | [; eis BEN « Ly , x] 0 


Step 2. Subject to the conditions 


7 20,1 = 0,1,2,---,>omoy. = 1, 0 yin V 


— 


we can, foreach non-negative value of N choose the y,’s so that 7 o yif,(n;) 
is minimized. To this end, let 


x 


R, =U (k, R,(k)). 
0 
Let R? denote the convex hull of R, and let ®, denote the lower boundary of 
RV i.e., R, = {(k, r):(a) (k, r) e R* and (b) there exists no point (k’, r’) belong- 
ing to R* such that k’ < k,r’ < r, at least one of these inequalities being strict. }. 
Then, to accomplish Step 2 of the minimization, given V = 0, we merely select 
the point (NV, r) belonging to ®, . Since (NV, r) is a boundary point of a two di- 
mensional convex set, (V, r) can always be expressed as a convex linear combina- 


tion of at most two points in R. We define 


rN) = min > yi R(n)>. 
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(0,Rry(0)) 
(1,Re) 





See Fig. 2. We note that 


min [ra’ + (1 — w)8’ + cn’| = min [r,(N) + cN]. 
(a’,B’,n’)eA® N>O6 


Step 3. We now wish to choose NV = 0 to minimize the expression r,(N) + cN. 
Since r,(N) is a strictly decreasing, convex and piecewise linear function of N, 
there exists at least one value of N and at most a finite interval of values of V 
which minimize r,(NV) + cN. 

It should be noted that if we are given a specific value of N, then there exists 
a number ec > 0 such that r-(V) + ¢N = min, [r,(k) + ck]. Therefore, for an 
arbitrary but fixed value of V > 0 any procedure obtained in Step 2 will be an 
admissible mixed single sample test so that Step 3 is inessential in constructing @. 

We shall apply the technique in several problems in the following sections 


3. Testing the mean of a normal distribution when the variance is known. Let 


: l ( 1 fx — 0\*) 
Ka,6) = / exp{— Ps 
; V 210 {| 2 o 


where ¢ > 0 is known. We wish to test the hypothesis Ho:@ = 6) against the 


alternative H,:6 = 6; , 0 < 6. It can be shown that for any integer n = 0, 


A, = {(a, B, n)ia = 1 — O(t), B = Ot — V/ nd) for—x St 
where 


IIA 


~p | 
1, 


and 
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Step 1. We have already seen that R,(0) min (x, | — a). For any integer 


n> 0, 


R,(n) min 
{(a,8):(a,B,n) € Ay} 


= min {x[l — (t)] + (1 — w(t — Vnd)} 


[ra + (1 — )8] 


re V nd 
n 


_ fo v=) a ( _ ) 
-r[1-0(Fa+ 2 |+a OL a 2 ))’ 


log x/(1 — a). Furthermore, the test (a, 8, n) such that 


¢ * t / 
a=l1-—- ( Pa af -) and ~B = » ( re. °) 
Vv nd z 


where § = 


vi nd 2 


is unique. It should also be noted that for any m such that 0 < r < 1, R,(n) is 


a strictly decreasing function of n. See Figure 3. 

Step 2. To accomplish Step 2 of the minimization, we consider R,(n) formally 
as a function of a continuous variable n. We shall first show that there exists a 
n,(r) such that R,(n) is concave on the interval (0, n;) and convex 


number n; = 
on the interval (n; , ~). To show the existence of n; , we use the identities 


(a) g(x — y) = e'"o(x + y), 
(b) ¢’(a) = —aeg(x), where g(x) = ’(z). 


A routine calculation shows that 


iis = E V ni 
lieeadi 2v/ na © (Fs + 2 


(O, R760) 


(A, Rr) 
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and 
y” _ 2-4 2 2 rv nd g V/ nb 
(d) Ry(n) = (no + 4nd — 4€) i6n? (Sa + 5 }° 
Setting R7(n) equal to 0, we find that 
_9 9 2 
(3) ny = St3yi +e: 


6 


Therefore, (3) gives a unique inflection point of the function R,(n). See Fig. 3. 

Since R,(n), defined in (2), is defined only for integral values of n, and since 
n, in general is not an integer, we assert that there exists an integer no = no(7) 
such that R,(n) is concave on the interval (0, no) and convex on the interval 
(no, ~). See Fig. 3. It then follows that 


. (1 “ NV 2.(0) += Rind if N < 1 
r,(N) == No/ No 
((N] + 1 — N)R,AUIN)) + (N — [N])RAIN] + 1) if N > no 


Thus Step 2 of the minimization is achieved. 

It now becomes clear that improved randomized procedures (a, 8, n) exist and 
are of the form (a, 8B, n) = y(0, 1,0) + (1 — y)(ao , Bo , mo) or (a, By n) = 
y(1, 0,0) + (1 — y)(ao, Bo, mo) where 0 < y < 1 and where 


V ni 2 

I 

8) = Box) = w( s _%v ”) 
Vv nd = 


for some m such that 0 < r < 1. 


It also becomes clear that a test (a, 8, n) ¢ A, if and only if n 2 no(r), where 
m is defined by the equation 


log nb 
og Vn 
8=4 =) * 
Vv no = 
This gives a complete answer to the general question of whether or not a fixed 
sample size procedure can be improved upon by means of randomization. 


3.1. We now consider the following problem: Given @ and 8, how can we find 
the test in @ achieving the given a and 6? To this end, consider two cases. 

Case l.a < B. Let @o = {(ao, Bo, no)ino = nol), Bo = Bol), ao = ao(x) for 
,<ae < ij. 

Let (a, 8, n) denote the test in @ with the given a and n. 

From the discussion of Step 2, it is evident that (a, 8, n) is an improved ran- 
domized procedure if and only if (a, 8, n) = y(0, 1,0) + (1 — y)(ao, Bo, no), 
where 0 < y < 1 and where (ao, 80, no) € @. In this case, a = (1 — y)ao, 8B = 
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Fic. 4. Shaded region corresponds to the set of (a, 8) for which the admissible test (a, 8, n 
is a randomized procedure; B; = {(ao , 8o):mo = 7}, Ui-e Bi = P( Qo | a, 8). 


y+ (1 — y)Bo, 7 (1 — y)no. These equations imply that a/(1 — 8) a 
(1 — Bo) and 1 — y = a/ao. The equation a/(1 — 8) = a/(1 — Bo) when inter- 
preted geometrically means that the points (0, 1), (a, 8) and (a), 89) are co- 
linear. The equation 1 — y = a/a when interpreted geometrically means that 
(a, 8) is between (0, 1) and (apo , Bo). 


If (a, 8, n) is not an improved randomized procedure, then 
(a, B, n) = y(a,1, Bi, [n]) + (1 — y)(ae , Bo, [rn] + 1) 


where 0 S y S 1 and where (a, 8; , [nj) and (az, 8, [n| + 1) € A and is of 
little interest. 

We summarize the preceding as follows: Let P(@o | a, 8) denote the projection 
of @ on the (a, 8) plane. See Fig. 4. It was convenient to let 6 = 1. If (a@, 8) lies 
on a line segment joining (0, 1) to one of the points (ap , 8») in P(@o | a, 8), then 
the test (a, By n) = (1 — a/ay)(0, 1,0) + a/aolao , Bo , mo) is the test in @ with 
the given a and 8. Otherwise, (a, 8, n) is achieved by randomizing over two fixed 
sample size procedures, one in A,,,; and the other in Aj.) 4: . 

Case 2. a > 8. Similar to Case 1. 

Table (1) shows the improvement in the expected sample size N which can be 
achieved for selected tests (a, 8, n) belonging to A, — @. In this case, we let 
6 = .1. 


3.2. Consider next the following problem: Given a and n, how can we con- 
struct the test in @ having the given a and n? We solve this problem geometri- 
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TABLE 1 


Expected sample size, N, of Percent saving 
admissible mixed single sample ang 
test achieving the given x 100 
n 
a and 8 


Sample size, n, of admissible 
single sample tests achieving 
the given @ and 8 


221 119 
383 287 
147 s4 
585 57 
134 123 
28 20 


_ 
= 


i bo 


¥ wx ws 


QO 1O is 


Fic. 5. Shaded region corresponds to the set of (a, n) for which the admissible test (a,3.n 
is a randomized procedure; ¢ = {(ao , No)ino = 2}, U C; = P(Q a,n 


cally. Let P?(@o | a, n) denote the projection of the set @ on the (a, n) plane. See 


lig. 5. Then, draw a line of slope n/a through the origin. Determine the point 
of intersection (a, mo) of this line and P(@o | a, n). Clearly, n/a no/ao. If 


~ 


ay > a, the test in @ having the given a and n is the mixture 


: ) (0, 1, 0). 
ay 


If ao S a, the test in @ having the given @ and n is a mixture of two tests, one in 
Ay) and the other in Ay) 4; and hence is of little interest. 
4. Tests on the mean of a binomial distribution. Let 


f(x, 0) = F(1 — a)! ifz = 0,1, 


0 elsewhere, 0 < 6 < 1. 
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We wish to test the hypothesis Ho:@ = 6 against the alternative H,:0 = 6, 
6; > 6. It is known that 


n+l n 


( +1 
A, = ( (a, B,n):a = Do y¥iai,8 = Dd Vi8i, 
\ t=O 


i=) 


T= Hh * 7 * Tue = °° Ton = 0, y; - 0, 


n t=1 
a > 7 @ 0 (1 a A)" a B; = z. (") (1 — 6,)' me 


r=) 
\ 


i=0,1,2,---n+1>. 


Howard Raiffa [2] has pointed out that if we consider the projections of A, 
and A» on the (a, 8) plane, there exists a test in Az whose operating charac- 
teristic is (0) , 1 — 6; , 2). However, there exists a test in A; whose operating char- 
acteristic is (0), 1 — 6,, 1). Hence (0, 1 — 6,2) 2 @. See Fig. 6. Furthermore, 


if z is such that 


T ae 6o(1 a a) 
1 — 6,(1 = 6,)’ 


; then R,(1) = R,(2). 





(20,-05(0-9)*) 


(1,0) 
Fia. 6 
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TABLE 2 


Probability of a type I 


> sli , , 
exror of admissible single Probability of a type II error 


a n samnte test having tt of randomized test having Percent decrease 
7 ean A aaa - os the given a and n 
.512 30 .029 .018 38 
.361 20 112 .090 19 
.098 20 .320 . 286 11 
.350 40 .030 .024 20 


Unlike the normal distribution, there does not exist an integer no(a) such that 
R,(n) is concave on the interval (0, no()) and convex on the interval (no(7), = ). 
Rather, it was found by numerical calculation that R,(n) has many inflection 
points. Thus, we do not generalize any further and present the following examples. 

Example 1. Let @ = .04, 6; = .15. Table 2 shows the percent decrease in the 
probability of a type II error that randomization achieves over fixed sample size 
procedures for the given @ and n. Since R,(n) was calculated for values of n 
where n = 5k where k is a non-negative integer, it cannot be said with certainty 
that the improvements shown in Table 2 are optimal. However, the optimal im- 
provements are at least as great as the ones recorded. 

Example Il. We again wish to test the hypothesis Ho:6 = 6 against the al- 
ternative H,:6 = 6, where 6) < 6, . Then, it is well known that any test (a, 8, 1) 
such that 


(a, 8, 1) = y(0, 1, 1) + (1 — y)(@,1 — @, 1), 


where 0 S y S 1 belongs to A; . We shall now show that if we are given a test 
(a, 8, 1) of the above type such that y S 1 — 6,/2 then there exists a mixed 
single sample test (a*, 8, 1) such that 


(4) a= a” a yl -¢C-— B) 
a (1 — y)(1 + B — 2y)° 
The expression (a — a*)/a is interpreted as the fractional saving in @ achieved 


by randomization. 
To prove this, consider the test 


y 


* 
(a ,B’,n’) = = 


(0, 1, 0) y ..1-6&.2 
.* T 5 5, 8 1 A, 


—™ U1 
2y 
+(1 2 5) (1 — 6,1). 


~ 1 
Since y < 1 — 6,/2, the above test is a bonafide mixture. It is easily verified that 
8’ = B,n’ = 1 and that (a — a*)/a has the value given in (4). 
To illustrate the fractional saving in e which can be achieved, consider the 
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test Ho:@ = 10 against H;:6@ = .95. Then, there exists a test (a, 8, 1) in A; where 
(a, 8B, 1) = .5(0, 1, 1) + .5(.10, .05, 1) = (.05, .525, 1). Consider the test 


(a*, 8,1) = 5—5; (0,1,0) + 5", (01, 0975, 2) 
+ (1 ——— ) (a0, 05, 1) = ( 525, 1). 
Z— 95 105 
Then 
a—a*_ 17 
a 21 


5. Tests on the range of a rectangular distribution when one endpoint is 
known. Let 


a l bi 
flz,9) = = ros <8 8, 
= @ elsewhere 
We wish to test the hypothesis Ho:6 = @) against the alternative H,:6 = 6, 


6; > 6. It ean be shown that 


0 — t" t’ 
A, =< (a, 8,n)sa = ——— ,8 = —,0StS & 
H 4; 
(5) 


A 


A\- 
=<(a,8,n):0S as 1,8 = ( ) (l — a@)>. 


It should be noted that Theorem 2 does not hold since f(x, %)) and f(x, 4) do 
not vanish simultaneously for values of x such that # S x S$ 6 . Hence, we shall 
alter our approach to generating @ by proving a theorem which will vield as a 
consequence a technique for constructing @. 
THEOREM 3. If (a, 8B, n) € A, where a > Oand n > O, then (a, B, ne GQ. 
Proor. If (a, 8, n) ¢ A, , then it follows from (5) that B = (00/0:)"(1 — a). 
Consider the test 


; Ay\" 
(a’, B’. n’) = a(1, 0, 0) “S (1 — a) (0,(*) n) 
A J 
6) 
(«. (1 — a) ( ) ,(l — a)n) 
A; 


= (a, B, (1 — a)n). 


Clearly (a, 8, (1 — a)n) is preferred to (a, 8, n). Theorem 3 states that all single 

sample tests (a, 8, n) such that 0 < @ S 1 and n > O are inadmissible in the 

class of mixed single-sample tests. Consequently, the class @ can be generated 

by the test (1, 0, 0) and the sequence of tests {(0, (% A)", k)i k Se. 1. 2. 

Since (6)/6,)"” is a convex function of n, it can be shown that (a, 8,n) ¢ @ if 
k+1 


and only if (a, B, n) = y:(1,0, 0) + y2(0, 00/0:)°, &) + ys(0, (00/0:)°", k& +1 
for some non-negative numbers y;, Yo, 73 and some non-negative integer / where 
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} ie 1. In fact it iseasily verified that k= [n/l — a], y= a, ye = 
(1 — a)({n/l1 — al] + 1) — n andy; = n — (1 — a) [n/1 — a]. See Fig. 7. 

Coro.uary 1. /f (a, 8, n le A,,, there exists a test (a, B, n') € @ where n’ = 
(1 — a)n. 

Proor. From the preceding discussion, the test (a, 8, n’) = a(l, 0, 0) + 
(1 — @)(0, (00/6:)", n) € @. Since n’ = (1 — a)n, the desired conclusion follows. 

We note that the fractional saving in the expected number of observations ob- 
tained by randomization is equal to a, 7.€., 

n—-n’ n—(l—a)n 


= = a. 
n n 


6. Tests on the mean of a reciangular distribution when the range is known. 
Let 


f(x, 0) = 1 fae<x<6+1, 
= 0 elsewhere. 
We wish to test the hypothesis 19:6 = 0 against the alternative H,:6 = 6, where 
0 < 6 < 1. A simple calculation shows that 
A, = {(a, 8B, n)ia = (1 — 0)", B= (1— 6)" —(1—92d", 


£2245 0 {(a, 8, n):0 Sa S (1 — O)",a+ 6 = (1 — 4,)"} 


See Fig. 8. Let Re(n) = mincas nea, [ta + (1 — 2)8] = min [r(1 — 6)", 
(1 — r)(1 — 6,)"] = (1 — 6)" min (x, 1 — w). Obviously R,(n) is a convex func- 
tion of n. It follows that A, C @. In other words, all fixed sample size tests are 
admissible in the class of mixed single sample tests. 
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NL 


(OQ 2) 
(Q0-@)'2) 


(-8,0) 


Fig. 8 


7. Confidence interval estimation. We next wish to extend the notion of mixed 
single sample procedures to confidence interval estimation. Perhaps this purpose 
can best be served by an illustrative example. 

Example. Let X denote a normally distributed random variable with unknown 
mean » and known variance o°. (There is no loss in generality if we assume that 
o = 1, and we shall do so for the remainder of this section.) We wish to con- 
sider the problem of obtaining a confidence interval for u. The standard pro- 
cedure consists of 

(a) choosing a number a between 0 and 1, called the confidence coefficient. 
(b) calculating a number ¢ using the equation a = 1 — 26(— ?). 
(c) drawing a sample of n independent observations on X and calculating 
X, the sample mean. 
(d) making the statement tha the interval (X — t/+/n, X + t/V/n) covers 
uw With confidence a. 
A confidence interval procedure is evaluated in terms of a triple (1 — a, L, n) 
where 1 — a@ denotes the probability that the confidence interval will not cover 
u, L denotes the length of the confidence interval and n denotes the sample size. 

We will now exploit the notion of randomizing over the sample size in con- 
fidence interval estimation using an approach similar to the one used in Section 
2. For integral values of n = 1, we let 

( 2t 
A, =<(1 —a,L,n):a = 1 — 20(), Lb = =,0St< => 
\ Vn 


s 


ra ) 
<dl—a,L,n)ia=1- 20 (— Vv y. 


> 


We define Ay = {(1, 0, 0)}. 
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As an analogue of the Bayes risk R,(n), we consider 


R,(n) = min [p(l — a) + (1 — p)L). 


(l-a,L,n) &€ An 
A routine calculation shows that 


R,(n) = p ifO0 <n < 2x(1 — p)’*p’, 


2 21 — 2 
= ape (— 4 log =——" )+ (1 — p) VV ee ae 


2x(1 — p)* n 2x(1 — p)? 


ifn > 2n(1 — p)*p’, 
See Fig. 9. 


If we treat R,(n) as a function of a continuous variable n, we find that 
>’ “se 1 
R,(n) = 0 ifn <-, 
e 


Ri(n) = — (1 — p)! = — = ifn > 4 

2 Vni ¥ log en c 
where c = p'/[2x(1 — p)’]. As in Section 3, there exists a non-negative number 
n; = n,(p) such that R,(n) is concave on the interval (0, n;) and convex on the 
interval (n;, ©). In fact, n; = [2x(1 — p)’], pe’. Using an argument similar 
to the one used in Section 3, it becomes clear that “improved”’ mixed confidence 
interval procedures exist and are of the form 


(a, L, n) = y(0, 0,0) + (1 — y)(a’, L’, n’), 


where 0 < y 1 and (a’, L’, n’) is a fixed sample size confidence interval pro- 
cedure. 





966 LEONARD COHEN 


TABLE 3 


Length of fixed sample size (Expected length of randomized 
procedure having the given | confidence interval procedure 
aand having the given a and » 


Confidence Expected 


Percent decrease in 
coefficient a sample size n 


the expected length 


.044 .110 .037 
392 me * 329 


174 2 146 


Table 3 gives some examples of admissible mixed single sample procedures and 
improvements which can be obtained in the expected length of a confidence 
interval if a mixing scheme is used. 

Improved randomized confidence intervals are of such a nature that certain 
questions are brought to mind. First, how much “confidence” can we place in 
randomized confidence intervals? It is true that a confidence interval of the form 
(a, L,n) = y(0,0,0) + (1 — y)(a’, L’, n’) will cover » 100 a% of the time, will 
have average length L and will have expected sample size n. However, if we are 
given confidence interval (0, 0, 0), we no longer have confidence a that we are 
covering pw. On the other hand, if we are given the confidence interval 
(X —“L'/2, X + L’/2), we have confidence a’ > a@ that we are covering uy. 
Furthermore, if a statistician uses a mixed procedure and does not tell this to his 
customers, then his customers can have confidence a—unless, of course, they are 
given the procedure (0, 0, 0). (However, if we restrict ourselves to procedures 
where the sample size n is at least 1, then they could still have confidence a.) 
In other words, by withholding information from his customers, the statistician 
gives them confidence a. By giving them information, he either reduces their 
confidence to 0, or increases their confidence to a’. 

This is not the only example of such a situation in statistical techniques. Take, 
for example, the Stein two sample procedure for finding a confidence interval 
(of fixed length 1 and confidence coefficient a) for the mean of a normal dis- 
tribution with unknown variance. A sample of no observations is taken and the 
sample variance S; is ecaleulated. Then, an additional n,; observations are taken 


where 
( Ss 
m = max{n, j +1>— nm, 
\ ( ) 


where d depends on @ and 1. The two samples are then combined, the mean X 


: : ' , co 2 2.5 
of the combined samples is calculated and the confidence interval (x —-,4 + ‘) 


is given. Now, if it turns out that the variance S° of the combined samples is 
much larger than So, one is led to believe that the second sample size was not 


large enough. Thus, one’s confidence of a might be reduced, given this informa- 
tion. However, if one did not have this information about S’, then one’s con- 
fidence would still be a. This situation is indeed similar to the preceding one. 

Another peculiarity of mixed single sample confidence interval procedures is 
that we get short length only when we do not cover u. This immediately brings to 
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mind the question of average length as a criterion for a confidence interval 
procedure. It is clear that small length is desirable if u is being covered. What 
one wants when u is not covered is open to question. Clearly, we can agree that 
procedures which give small length when yu is not covered and large length when 
uw is covered are not desirable ones. Randomized procedures are of this nature. 


8. The k decision problem. Let Y denote a random variable with distribution 
funetion F(x, @). Instead of considering only two possible values of 6, 6 and 
6, , as we did in the previous section, we now consider k possible values of 6. 
Let 6, , 02, --- , & denote the k possible values of 6. We assume that 06; < @ < 

< @. For any fixed sample size decision rule 6, , based on samples of size n, 
let a:(6,) denote the probability that 6; will not be selected as the true value of 6 
when 6; is the true value of 6 if the decision rule 6, is used. Every fixed sample 
size decision rule is then identified with an operating characteristic 
(a, , a2, °** ag, n) Where a; = a,(6,) for? = 1, 2, --- , k and where n denotes 
the sample size. The classes S, , A, , A, A* and @ are defined in an obvious way 
and the functions R,(n) and r,(n) are defined as in Section 2 where 
7 (m1, m2,°:* me), m, = O and > int x, = 1. We ean then extend all the 
results obtained in Section 2 to the / decision problem. 

In the particular case 


ez 1 ( l t — 2 
F(x, 6) ) exp < — dt, 
J—0o ¥Y 2ro 2 o 
where ¢ > 0 is known, we shall show that it is possible to obtain improvements 
by randomization. For each positive integral value of n, an essentially complete 
class of decision rules, C,, can be generated in the following way: Let 


(a), 2, +++ x,) denote a sample of n independent observations on X and let 
let (to,t:,--:*,t) denote a partition of the real line such that ¢; S ti, 
i 0, 1,---k& — 1. In particular, = —* and t, = «. Then any procedure 


which selects 0, as the true value of @ whenever ¢;_, < XN < ¢; is called a mono- 
tone procedure. Let C,, denote the class of all monotone procedures. The class 
C,, is known to be essentially complete. 

By definition, 


) 


apie 


> ria; = min > mia, 
ti, bank 1 


(@y,@9,°++,apet 


R,(n) = min 


a1, n 


y — a 
min > =: E ne (vi (t; — 0, ) 46 (vi it. — 6 ) | 
(ty.tarceeteiy) tol a. o * 


where 


=-—x and = 
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Considering R,(n) as a function of a continuous variable n, we find 


; , ) - &; 1 ees) ; bi 
(a) R,(n) = => Tip (Js : 7, =] va 


” : a g 1 + vat -1 9 
(b) RK 7(n) = 2» gf? 5, (Sac , 9 Vor an -}- 45; a= 4¢; 1) 
For each value of 7 = 2, 3, --- k, the function fi(n) = Caw +a <= 4¢% 118 


a quadratic function of n. Since the only non-negative root of the equation 
fi(n) = Ois 


—-24+2V1+ 8, 


-— Ba 
it follows that 
fin) £0 f#Osnen,, 
fi(n) > 0 ifn>n,. 
Then, since 6;_; < 0 for 7 = 2,3, --- , k, it follows that 
R’(n) < 0 ifn < min (n,;) 
and | 
Re (n) > 0 ifn > max (n;) 


Hence, if we let a = min; (n;) and b = max; (n,), it follows that R,.(n) is concave 
on the interval (0, a) and convex on the interval (b, ~). Clearly, a S b. 

Thus, for certain values of 7, m2, --- m, and 0, 6, --- , &, it is possible to 
achieve improvements by randomization. 


9. Testing a composite hypothesis against a composite alternative. We next 
wish to extend the notion of mixed single sample tests to the problem of testing a 
composite hypothesis againxt a a alternative. To fix ideas, let 


I 1 /xz— 0 *\ 
f(x, 6) = Vox b exp{—} ( = )}, 


where ¢ > 0 is known. We wish to test the hypothesis 9:6 < 6 against the 
alternative H,:6 > 6,, 6; > 6. If we are given a and n, the “best” fixed sample 
size test of level a and size n is obtained by using the best fixed sample size test of 
Ho:0 = 0% against Hi:0 = 6 corresponding to the given a and n. The resultant 
fixed sample size test has the desirable property that its power function P(@ | a, n) 
tends to 1 as @ tends to infinity. 

Can we construct, for given a and n, a “good’”’ mixed single sample test of level 
a and expected sample size n in an analogous way? Clearly, if the best mixed 
single sample test of H¢ against H; is a bona fide mixture, it is not even true that 
its power function, P(6), approaches | as @ approaches infinity. For, in this case, 
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the fixed sample size test (0, 1, 0) will be chosen with probability A, say, where 
0 <A < 1,so0 that P(@) S 1 — A for all 0. 

However, it should be noted that the fact that P(@) does not tend to 1 as 6 
tends to infinity is not always undesirable for we know, in certain cases, that the 
set of possible values of 6 is bounded, e.g., in testing the mean height @ of Amer- 
ican soldiers, we know that 6 S 6 feet 2 inches. Consequently, a test procedure 
which does not have high power at 6 = 7 feet is not necessarily undesirable. 

Finally, we note that if we restrict ourselves to randomizing over fixed sample 
size tests of sample size n > 1, then P(@) > las @— ~. 


10. Comparison with the Wald Sequential Probability Ratio Test. In general, 
it is difficult to compare the improvements attainable by using the Wald Se- 
quential Probability Ratio Test with improvements attainable by randomizing 
over fixed sample size procedures. For, every test will now be identified with a 
quadruple (a, 8, Hs, (n), Fs, (n)). Es, (n) and Ee, (n) are usually difficult to 
calculate. However, in the case of mixed single sample tests, Ee,(n) = E»,(n) 
and do not depend on the unknown value of 6. In some special cases it is easy to 
make a comparison and this we shall do. 

Example. 


f(x, 0) = ifesxr< 8, 


co f+ 


elsewhere. 


It can be shown that if we use Wald’s test, only two types of tests are attainable. 
They are the test (1, 0,0, 0) or tests of the form 


1 (*) \ 

; {5 

0. (*) k A; 
yr) 

\ 6, 


where k is a non-negative integer. However, using mixed single sample tests, we 
can attain the test (1, 0, 0, 0) and tests of the form (0, (@/6;)", k, k) where k is a 
non-negative integer, and mixtures of such tests. Since 


k 
lim 8.\! 
@» ol 1 — (*) 
6) ! 6, 
A 
l ” (3) 


it is clear that if 0/6 is close to 1, then mixed single sample procedures are 
almost as good as Wald procedures. 


= ]. 


11. Estimation. Can mixing fixed sample estimation procedures yield im- 
provements in estimation techniques? If we evaluate a fixed sample size estimator 
t, in terms of a pair of numbers {E[L(t, , @)], n}, where E[L(t, , @)] denotes the 
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expected loss if the estimator ¢, is used when @ is the true parameter and where n 
denotes the sample size, then mixing over fixed sample size procedures will not 
yield improvements since in all problems of practical interest E[L(t, , 4)] is a 
convex function of n. For example, if we wish to estimate the mean 6 of a dis- 
tribution with finite variance o°, then, if t, = NX andif L(t, , 0) = k(X — 6)°, we 
find that E{L(t, , @)| = ko /n. Thus, it will not pay to randomize. 


12. Conclusion. In what situations is a mixed single sample procedure justifi- 
able? In order to answer this question, we must first realize that throughout this 
paper, we have been judging a test 6 by its operating characteristic (a, 8, n). If 
this triple is our only means of evaluating a test procedure, then it is true that 
single sample procedures would not be justifiable since a sequential probability 
ratio test achieving the given a and 6 would be better. However, practical con- 
siderations might limit one to a single stage of sampling, e.g., in agricultural 
experiments, one might not wish to use more than one stage of sampling; or, if 
one is testing electric light bulbs, one might not wish to test the bulbs sequen- 
tially. Other examples could be given. 

One could reasonably ask why fixed sample size procedures should not always 
be used in these situations. Presumably, if the experiment were a so called ‘one 
shot affair’, i.e., if the experiment were never to be repeated, then one might 
reasonably insist on a non-randomized fixed sample size procedure (although, of 
course, this position is not universally held). However, if one repeats the 
experiment often, it would be reasonable to use a mixed sample size procedure. 
To illustrate this point, consider Example II in Section 4. In this example, sup- 
pose @ represents the probability that a person who has been contaminated with 
a certain disease will respond positively to a certain test and 6 represents the 
probability that a person who has not been contaminated will respond positively 
to this same test. Then, if several thousand people are to be classified as either 
contaminated or non-contaminated according to this test, then the mixed test 
(1/101, .525, 1) would be preferred to the test (.05, .525, 1) since the mixed test 
will falsely classify less than 1 percent of the contaminated people whereas the 
fixed sample size procedure will misclassify 5 percent of the contaminated 
people. On the other hand, both tests will misclassify the same percentage of non- 
contaminated people, and both procedures will use on the average of one test per 
person. 

At this point, one could raise strenuous objections to mixed single sample tests 
on grounds similar to those raised in Section 7, i.e., if one is told which single 
sample test is actually used, the conditional probabilities of misclassification are 
no longer a and 6. For example, consider a mixed test of the form 


(a, B, n) 7(0, 1,0) + (1 — y)(a’, B’, n’). 


Now, suppose that a person is told that he has been classified according to the 
test (0, 1, 0). Such a person would of course be most unhappy. On the other 
hand, if he is not told which of the tests was used, he would maintain his con- 
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fidence in the procedure used. In other words, by withholding information, one can 
influence a person’s willingness to accept a result. Some feel that axiomatically this 
is an untenable policy. 


13. Acknowledgements. The author is indebted to Allan Birnbaum for sug- 
gesting the Bayes approach exploited in this paper. The author is also indebted 
to Howard Raiffa for his many helpful suggestions and constructive criticisms. 
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ASYMPTOTIC NORMALITY AND EFFICIENCY OF CERTAIN 
NONPARAMETRIC TEST STATISTICS' 


By HerMan CHERNOFF AND I. RicHarp SAvaGe 
Stanford University and University of Minnesota 


1. Summary. Let X,,--- ,X, and Y,,--- , Y, be ordered observations from 
the absolutely continuous cumulative distribution functions F(x) and G(r) 
respectively. If zy; = 1 when the 7th smallest of N = m + n observations is an 
X and zy; = 0 otherwise, then many nonparametric test statistics are of the 
form 


Vy 
mT x = Z. Ew; ZNi- 
i=] 


Theorems of Wald and Wolfowitz, Noether, Hoeffding, Lehmann, Madow, and 
Dwass have given sufficient conditions for the asymptotic normality of T'y . 
In this paper we extend some of these results to cover more situations with 
F # G. In particular it is shown for all alternative hypotheses that the Fisher- 
Yates-Terry-Hoeffding c,-statistic is asymptotically normal and the test for 
translation based on it is at least as efficient as the ¢-test. 


2. Introduction. Finding the distributions of nonparametric test statistics 
and establishing optimum properties of these tests for small samples has pro- 
gressed slower than the corresponding large sample theory. Even so, it is not 
possible to state that the basic framework of the large sample theory has been 
completed. Dwass [3] has recently presented a general theorem on the asymp- 
totic normality of certain nonparametric test statistics under alternative hypoth- 
eses. His results, however, do not apply to such important and interesting pro- 
cedures as the ¢,-test [11]. Many papers have appeared giving the asymptotic 
efficiency of particular tests. Hodges and Lehmann [7] have discussed the asymp- 
totic efficiency of the Wilcoxon test with respect to all translation alternatives. 
In the same paper they have conjectured that the c,-test is as efficient as the ¢- 
test for normal alternatives and at least as efficient as the ¢-test for all other 
alternatives. 

The beginning of our work came from a desire to verify the Hodges and Leh- 
mann conjecture. Related to the conjecture is the hypothesis that the c)-statistic 
is asymptotically normally distributed. Thus our work has two parts: developing 
a new theorem for asymptotic normality of nonparametric test statistics and the 
establishing of the variational argument required for determining the minimum 
efficiency of test procedures. 
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Our basic result on the asymptotic normality of statistics of the form T'y is 
Theorem 1 of Section 4. This theorem is a partial generalization of results of 
Dwass [3] summarized in our Theorem 4. Theorem 1 is not given in the most 
general form possible. Our choice of the level of generality was to facilitate our 
writing and your reading. 

Section 3 contains our basic notation and assumptions. Section 4 contains 
statements of the theorem on asymptotic normality as well as the basic portion 
of the proof. Details regarding the negligibility of the remainder terms are given 
in Section 7. The variational arguments are presented in Section 5 and Section 
6 relates our Theorem 1 to Dwass’s results. Applications of Theorem 1 to several 
nonparametric tests are given in Section 6. 


3. Assumptions and notation. Let X,, X2,--- , Xm be the ordered observa- 
tions of a random sample from a population with continuous cumulative dis- 
tribution function F(x). Let ¥,, Ye, ---, Y, be the ordered observations of a 
random sample from a population with continuous cumulative distribution func- 
tion G(r). Let N = m + n and Ay = m/N and assume that for all N the in- 
equalities 0 < Ay S Aw S 1 — Ao <1 hold for some fixed A» S 3. 

Let F,,(z) = (number of X; S x)/m and G,(x) = (number of Y; S 2x)/n. 
Thus F,,(x) and G,(x) are the sample cumulative distribution functions of the 
X’s and Y’s respectively. Define Hy(x) = AwF n(x) + (1 — Ay)G,(x). Thus 
H x(x) is the combined sample cumulative distribution function. The combined 
population cumulative distribution function is H(x7) = AwF (x) + (1 — Aw)G(2). 
Even though H(.r) depends on N (or rather m and n) through Aw our notation 
suppresses this fact for convenience. In fact F(x) and G(x) may actually depend 
on N although this will not be stated explicitly. In Corollary 1 the distributions 
do depend on V. The point for suppressing this fact is that our limit theorems are 
“uniform” and hold, whether the distributions are constant, tend to a limit, or 
vary rather arbitrarily with the sample size N. 


If the 7th smallest in the combined sample is an X let zy; = 1 and otherwise 
let zy; = 0. Then our concern is with statistics of the form 
N 
(3.1) aT, = 2 Buc tri, 
i=l 
where the Ey; are given numbers. (The special case where Ey; = E(i/N) is 


particularly easily handled by our methods. For the Wilcoxon test this condition 
is met with Ey; = 7/N, and Freund and Ansari [6] have considered Ey; = 
E(i/N) = \4 — ¢/N | in testing for the equality of dispersion of two popula- 
tions.) The definition (3.1) of Ty is the one conventionally used. We shall, how- 
ever, use the following representation: 


(3.2) Ty = J yiH xy (x)| dF, (x). 


— © 


The definitions (3.1) and (3.2) are equivalent when Ey, = Jy(i/N). A repre- 
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sentation like (3.2) was used by Blum and Weiss [1, page 243, Eq. 2.4] and R. v. 
Mises considered f ¢(x) dF’,(x) in detail [9]. 

Throughout our proofs K will be used as a generic constant which may depend 
on J» but it will not depend on F(x), G(x), m, n, N. Statements involving o, 
or O, will always be uniform in F(x), G(x), and H(.x), and Xy in the interval 0 < 
A S Aw S1—XA < il. 

While Jy need be defined only at 1/N, 2/N,--- , N/N, we shall find it con- 
venient to extend its domain of definition to (0,1) by some convention such as 
letting Jy be constant on (¢/N, (¢ + 1)/N}. 

Let Jy be the interval in which 0 < Hy(x) < 1. Then Jy is closed on the left 
at the smallest observation and open on the right at the largest observation. 
The interval, Jy, has a random location. 


4. Asymptotic normality. 
THEOREM 1. Jf 


(1) J(H) = lim J x(H) exists for 0 < H < 1 and is not constant, 


(2) (Jy(Hx) — J(Hy)| dF,(x) = 0,(N~"”), 


YIN 
(3) Jy) = o(/N), 


dd 


JO(H) | = < KIHQ — myo 
(4) |J°(H)| = So. s KIHQ — H)) 


for i = 0,1, 2, and for some 6 > 0, 
then, for fixed F, G and xy , 
(4.1) lim p(? <n 
N+»w On 


where 


(4.2) — [Ma are) 


and 


Now = 2(1 — dy)¢ If G(x)[1 — G(y)|J [H (2) 1H (y)| dF (x) dF (y) 
(4.3) NMeErcy<cw 
(1 — Xr, 


+ * 


If F(x)(1 — F(y)|J'|H (a) [H(y)] dG(x) dG(y) > , 


providing on ¥ 0. 

In Eqs. 4.1 and 4.3 we put subscripts on yw and o to recall that these depend 
on F, G and Xy and are meaningful in the more general case where F, G, and Xx 
are not fixed. Corollary 1 will extend Theorem 1 to obtain convergence to normal- 
ity uniformly with respect to F, G, and Ay for a broad range of F, G, and dy . 
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To facilitate the proof of Corollary 1, we will regard F, G, and Aw as variable 
throughout the proof of Theorem 1 except where it is specified otherwise. 
Assumption | is likely to be filled whenever one speaks of a sequence of tests. 
In the special case Ey; = E(i/N) of course Jy = E = J and Assumption 
will automatically be satisfied. Theorem 2 shows that Assumptions 1, 2 and 3 


9 
, 
are often satisfied when the /'y; are the mean values of order statistics. Assump- 
tion 4 is the basic condition. The assumption has two functions: it limits the 
growth of the coefficients Hy; and it supplies certain smoothness properties. 
Both conditions are essential to our argument. We believe that the theorem is 
true without the smoothness condition. 
Proor. To begin the proof we rewrite Ty as 


aa | Jx(Hy) dFp(x) = | (Jy y) — J(Hy)] dFa(x) 
x N 


“I 


4. [ J(Hy) dF (x) + | J (Hy) dF» . 
41x ° 1 


Hy 


In the second integral we write dF, = d(F,, — F + F), J(Hy) = J(H) + 


(Hy— H)J'(H) + ((Hy — H)/2\J"(eHx + (1 — ¢)H], where 0 < ¢ < 1,and 
H = XxsF + (1 — Ay)G. After multiplying out the expression becomes 


6 
T, = A+ Bus + Bex + 2. Cr, 
i=l 


where 


(4.4) ] J(H) dF (x), 


“0<H<1 


(4.5) = J(H) d[F,.(2) — F(x)], 
J0 1 


O<cHe< 


(4.6) ; (Hy — H)J'(H) dF(x), 


<H<1 


(4.7) ww dw | (Em — F)(H) dlFax) — F(2)) 


“O0<H< 1 


(4.8) "9 (1 — Ay) / (G, — G)J'(H) d[F,.(x) — F(2x)), 


Y0cH<1 


(49) Cw= f S8=) piety + 1 - OH eFax), 


“In - 


(4.10) “= [ [— J(H) — (Hy — H)J'(A)|\ dF,,(2), 


“Hy=l 


(4.11) 6 [ lJ v(Hy) — J(Hy)| dF,,(2), 


“I 


(4.12) ' Jx(Hx) dF n(x). 


/Hy=1 
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The A, B, C terms represent the “constant,” “first order random,” and “higher 
order random” portions respectively of Ty . In this section a detailed study of 
the A and B terms is made and in Section 7 it is shown that the C terms are of 
higher order. 

The “constant” term, A = Joe uei J (HA) dF (x), is finite as a result of Assump- 
tion 4 of Theorem 1; see Section 7.4.10. Since A depends on Ay as well as F(x) 
and G(x) it need not converge as N — ~, but it does remain bounded. 

Integrating Boy by parts and using the fact that 


~*~ 


[ atra(e) — F(x)| = 0, 


= 30 
we obtain 


( 2 


Biy + Boy = [1 — Awl < i B(x) alF,,(2) — F(x) 
(4.13) ° 


x 


“a B*(x) dG,(x) — G(x}, 


where 
“z 


(4.14) B(x) = | J’IH(y)) dGly) 


“2 


(4.15) B(x) = [ sary) 
and 
\wB*(2) + (1 — dx) B(x) = JIA (x)) — JH (2)] 
with x» determined somewhat arbitrarily, say by H(2o) = 1/2. 
Thus, 


f 


n+ he em he ~. > {(BX,) — 6B(X)] 
l i=l 
(4.16) 


l - * ) 
—- > (B*(Y,) — sB*(Y))}}, 
| re | 


where & represents expectation and X and Y have the F and G distributions re- 
spectively. 

The two summations involve independent samples of identically distributed 
random variables. Therefore, if /, G, and \y are fixed, B(XY) and B*(Y) are speci- 
fied random variables and we may apply the central limit theorem to show that 
By + Boy when properly normalized has a Gaussian distribution in the limit. 
The central limit theorem applies if the variances of B(X) and B*(¥) are finite 
and at least one is positive. 

First, we shall find a bound on the moments of B(X) and B*(Y): 


B(x) | = J'\H(y)\ dGty) <= K{H(a)[1 — H()\y*". 


“2 
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Thus for & > 0 such that (2 + 6’)(—4} + 6) > —1, 


6}| B(X) |" < K [H(x){l — Ha) Pe? dF (a) 


<K/ (HQ — |"??? aH < K, 


having made use of dG Ss (1/9) dH. (See Section 7.A.8.) 

Similarly, we may bound the 2 + 4’ absolute moments of B*()"). The asymp- 
totic normality of B,y + Bey follows providing BCY) and B*(Y) do not both 
have zero variance. 

We compute the variances of B(X) and B*()). These can be expressed in 
terms of f B(x) dF (x), § B(x) dF (xr), ete., but we shall use a slightly different 
approach. 


~ 


B(X) — eB(X) =  B(z) dF (x) — F(z) 


i 


-| [Fi(x) — F(x)W’\H (a) dG (2) 


has Variance 


onx) = & | lFi(~) — F(a) IFi(y) — Fay)\J (Hi) |’ (H(y)| dG(x) dG(y) 


and 


(4.17) Satz) = 2 I| F(a)ll — F(y)\J’ A (a)| J (HG) dG(a) dG(y), 


if it is permitted to interchange expectation and integral. That this may be done 
follows from Fubini’s theorem when it is seen that for «+ < y, 
&{| Fi(z) F(x) Fy) — Fly) |} s KF(x){l - F(y)| 
and that the last integral above is finite. (In fact this integral is bounded in the 
argument dealing with (Cs;.) in Section 7.B.) 
Similarly, the variance of B*(¥) is given by 


(4.18) ee I| G(x) — G(y))J’[H (a) 1H (y)| dF (2) dF (y). 

These two variances when combined give the variance result stated in (4.5). 
We review the status of our proof. In Section 7, the C terms are shown to be 
“higher order uniformly.” The A term is non-random and finite. Finally 


Byy + Boy 


is the sum of two independent terms each of which is the average of random 
variables with mean 0 and finite second moments. Theorem 1 follows. 

The proof given can be extended to the case where F, G and Ay are not fixed. 
To obtain uniform convergence to normality, we apply a theorem of Esseen 
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(\4], p. 43) which is a generalization of the so-called Berry-Esseen theorem 
({8], p. 288)°. Since the C terms are uniformly 0,(1/~+ ‘'N) it suffices to obtain 
uniform convergence for Byy + Boy. For this it suffices to bound ps,5 for 
B(X) and B*(Y). Since we bounded the absolute 2 + 6’ moments, all that is 
required is to bound the variances of B(Y) and B*(¥) away from 0 and to have 
mand n— ©. Thus we have 

Coro.uary 1. [f the conditions | to 4 of Theorem 1 are satisfied, and F, G, and 
Aw(O < Ao S Aw S 1 — A < 1) ave restricted to a set for which B(X) and B*(Y) 
have variances bounded away from 0, then Eq. 4.1 (asymptotic normality) holds 
uniformly with respect to F, G, and Xx . 

Coro.uary 2. If conditions 1 to 4 of Theorem 1 are satisfied, 0 < » S Aw S 
1 — ro < 1, 


F(x) = V(x — 6y), 
G(x) = V(r — gn), 


where V has a density y, then Eq. 4.1 holds uniformly with respect to Xx , Ox and 


gn for gx — Oy in some neighborhood of 0. If gx — 6x — 0, 
‘ An Noy F “i ; 
lim N On _ 2 If a(l — y)J (x) J (y) dx dy 
N+x ( | _ An ) . 

(4.19) 0<r<ev<l 


= F J*(x) dx — F J(x) dx r 
| Lf ya 
“0 “0 


Proor. It suffices to show that BCY) and B*(¥) have variances bounded away 
from zero and to establish Eq. 4.19. Since J is not constant and has a second 
derivative, there is an interval of u in which J’(u) is bounded away from 0 and 
in which J’(u) > O or in which J’(u) < 0. There is a corresponding interval of 
x for which V(x) lies in the wu interval and its density y(.c) is almost everywhere 


bounded away from 0. For gy — 6y small enough, there is an x interval whose 
length is bounded away from 0 where the densities f(x) = (xr — 6y) and g 
¥(2@ — gy) are almost everywhere bounded away from 0 and J’[H(x)| is bounded 


away from zero. It follows that BCX) and B*(Y) have variances bounded away 
from zero. 

All that remains is to establish Eq. 4.19. The first equality follows directly 
from Theorem | by letting F(x) = «* and G(r) > x*. The second equality can 
be obtained by interpreting the double integral as 


Tif J'(x)J'(y) du dx dy dv 


Ogu<cz<y<r<l 


2 Esseen’s theorem states that if NY, ,X., --- ,X, are independent observations from a 
population with mean 0, variance o?, and finite absolute 2 + 6’ moment 82,5’ , 0 < 6’ S 1, 
se 
.* * ‘/g? (248 p2 45! "* ° . Cc * . 
then | F* — ®*| < C6 7 4 — | where F* is the cdf. of X, ®* is the approximat 
n° /2 ni? 


ing norma! edf, C depends only on 6’ and pe,5' = Bess’ /o?*? 





NONPARAMETRIC TEST STATISTICS 979 


and integrating with respect to y first and x second. It can also be obtained by 
considering a standard derivation [13] of the asymptotic distribution of T'y 
when F = G where 7'y is regarded as the average of a sample of m from the 
population of N numbers Jy(1/N), Jw(2/N), +--+, Jw(N/N). 

We remark that normalizing J so that fi /(«) de = 0 and fj J°(x2) dx = 1 
will not affect the efficiency of the test. Furthermore, if J is the inverse of a edf, 
the right-hand side of (4.19) is the variance of that distribution. 

In applying Theorem 1 the verification of condition 2 may cause some diffi- 
culty. The following Theorem 2 gives a simple sufficient condition under which 
conditions 1, 2, and 3 hold. In particular with the use of Theorem 2 it is simple 
to verify that the distribution of the c)-statistic does approach a Gaussion dis- 
tribution for alternative hypotheses. 

THEOREM 2. [f Jy(i/N) is the expectation of the ith order statistic of a sample of 
size N from @ population whose cumulative distribution function is the inverse func- 
tion of J and 


J(u) | <= Ktua — wp, ¢=0,1,2 
the vl 
lim Jy(H) = J(A), o< 8 < i, 
Nox 
Jy(1) = o(N' 7 
and 


- 


| LJ v(H ~) = J(Hy)| dF (ax) = 0(N " 


iN 


(We write o instead of 0, because the random sequence is bounded by a non- 
random sequence which is o(N~"*). In faet | f [Jx(Hy) — J(Hy)|dF,.(x) | < 
(1/d) f | Jw(Hy) — J(Hy) | dH x(x) and our proof essentially shows that this 
latter integral which is non-random and independent of F and G, is o(N~‘”).) 

Proor. It is well known that Jy(/7) — J(/7). A proof of the other two results 
is given in Section 7.C. 


5. Variational argument. We have now established that the limiting distribu- 
tion of the c)-statistic is Gaussian. Thus we may proceed with the study of the 
efficiency of this test procedure. We will examine translation alternatives only. 
Since the power of the ¢;-test approaches one when the distributions F and G 
are held fixed as V approaches infinity we restrict our consideration to the follow- 
ing situation. 

There is a distribution function ¥(2) which does not depend on N and F(x) 
V(x — 6) and G(r) = W(x — ¢). We test the hypothesis that A = @ — ¢ = 0 
vs. “near” alternatives of the form A = Ay = cN~"”. We will also assume that 

0 <limAy =A <1. 


Nox 


With this framework we are able to use the Pitman criterion (the one considered 
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by Hodges and Lehmann) for finding efficiencies of test procedures. The follow- 
ing conditions have been established for the c-statistic if W has a density and 


clearly hold for the ¢-statistie if W has finite second moments. There are functions 
ay(A) and by(A) such that for A in some neighborhood of 0, 
Ty — ay(A) ; 
(5.1) e( : = ) > N(O, 1) 

bx(A) 

. ., by(An) 
(5.2) limit ““—"’ = 

N20 by(O) 
” ; — ay(Ay) — ay(O)| 
(5.3) ky, = limit las et av(0)| 

N-»20 Ay N' *by(0) 


exists and is independent of c. 

The quantity Fr is called the efficacy of the procedure based on the sequence 
of statisties Ty. Of course 7 depends on ¥. In comparing two sequences of 
tests, say 7'y and 7x , for the same pair of near alternatives the two tests will 
have the same power only when the corresponding sample sizes, N and N*, 
satisfy the following relationship 

\* ky; 


(5.4) limit —— = me Be ve 
Nox N Eve 


if Nae # O. By pe is called the asymptotic relative efficiency of Ty with respect 
to Tr . 

Let £., .(¥) denote the asymptotic efficiency relative to the ¢-test of the e- 
test against translation alternatives. Then we have / Jo the inverse of the 
normal V(0, 1) edf ® and applying Corollary 1 and using derivatives in the ex 
pression for Ey; , we have 


(5.5) Ee, (Vv) = Ise o, 


where 
(5.6) liy = | Jv (x) |W (x) da 


and o is the variance of the distribution with edf ¥ (and density y).” Normalizing 
WV to have mean O and varinnee | does not affect EF. Cv) which then becomes 
equal to [iw . In this section we shall prove 

THEOREM 3. If WV isacdf with ade nsity and finite second moment, then EB ao ) 
1, and Ee, (®) lL only af VW is normal. 


Proor. It suffices to show that the minimum of J/;y subject to the restrictions 
Toy ay (x) az = 0 


If ¥ does not have finite variance o?, E,,., is not defined but it makes sense to regard 


it as x. 
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and 
Iyy = [ ev@) dz = | 


is attained only for ¥ = # and that Jj@ = 1. 

A density Y(7) assigns to each x a value of W and a corresponding value of 
Jo(¥(x)}. If Y(v) = 0a.e. on an interval, this interval corresponds to a fixed value 
of Jo|¥(r)]. If x is then regarded as a function of J» , it is multivalued at that 
value of Jo. Otherwise x is continuous and it is increasing in Jo. Conversely 
any monotone non-decreasing function x of J» determines « corresponding cdf 
Vv. We have 


u= |. o( u) |, 
1 


Jol ; = ” - m 
” ¢lJo(u)| 

and 

¢(t) = : ‘ 12 . 

V 2r 
kurthermore 
az J¢ 

(5.7) | y(t) dt = V(r) = | g(t) dt 


and 
V(x) dx d¥(x) = (Jo) dJo. 


Consequently our problem consists of finding a monotone function z(J9) which 
minimizes 


rf 1 ¢glJo) r o(Jo) 
y= - Jo) dJo = - dd 
Iw | ¢(Jo) (“*) els) d J ($) . 
dJ dJo 


subject to the restrictions and 


(5.8) 


(5.9) oy = | wp(x) de = | reo) do = 0, 

(5.10 Ise = | x(x) dx = [ xe(J0) adJy = 1. 

In the above form it is immediately obvious that if ¥ = &, x = Jo and hence 
lie 1. This form is also more suitable for our variational approach. 


Suppose now that x is replaced by x* = cx. Then J; , J, and J; are replaced 
by c I,/c, It cle, and I} = ¢1;. Thus if I, = 0 and J; < 1, we can 
obtain I> Oand J? | with J} < /,. This discussion is relevant to the proof 


of the following lemma 
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LemMa 1. The solution of the minimization problem is unique if it exists. 
PRroop. Suppose x; and 22 are distinet functions with non-negative derivatives 
Then let x (1 — w)x; + wr., where 0 S w S 1. Then, by convexity 


Ty w) = | eso) ad Ti (l — w) / eo) ad -t w | eo) aJo, 
(s ) (3) : dx» \ 
dd, dd Addo) 


Ile) = | elle ds = 1 ~ w) | zelJd) dle + w | selJ>) al 


J 


and 


I(w) = | lid die < i ~ w) | ddd dhe + 0 rig(J>) dd 


a 7 


Hence x; and xr. cannot both be solutions of the minimization problem since 
otherwise a multiple of (7; + 22)/2 would then satisfy the side conditions and 
vield a smaller /, . 

With this lemma, all that remains is to show that x = Jo is a solution of the 
problem. To this end we establish a sufficient condition for the solution of the 
problem as follows. Suppose that 2; and x2 are monotone functions satisfying 
the restrictions where rz gives a lower value for J; than does x, . Then using the 
convexity again, we have 


d (ae sae x1) 
dJ» 


7 (3) 


15(0) = [ cw: — x)¢(Jo) dJo = 0, 


1:(0) = g(Jo) dJy < 0, 


and 
1;(0) = 2 [ avr: — a ¢elJo) dJo < 0. 


Consequently we have 
Lema 2. [f 2; satisfies the restrictions and if for each x2 which does so also there 


ws a & = O such that 
1\(0) + &13(0) = 0, 


then x, is the unique solution of the minimization problem. 


‘ This sufficient condition is essentially the usual Euler equation except that with the 
convexity at our disposal and the monotonicity restriction, it plays the role of a sufficient 


instead of a necessary condition 
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Now 
2d7x; 


, — (x2 — 2X)) " ¢' (J) dJz © 
T,(O) == -\2 (Jo) + [c: ~ x) 


dx ') - (i dJ>. 
dJo dd 


1\(0) + ¢73(0) = / (xe — avdle’(WJo) + 2Jo¢Wo)| do, 


(Jy) 


Now let x:(/0) = Jo. Then 


which vanishes for = 1/2. Applying Lemma 2 establishes our theorem. 

If we regarded the c;-test as one tailor made to compete against the best 
parametric test for translation when F and G are normal, we may inquire about 
nonparametric tests designed to compete against the best parametric tests when 
F and G have some other form. 


Suppose F and G are known to be of the form Fo(2 — @) and Fo(x — ¢) respec- 
tively where Fo has a twice differentiable density fy . Then an efficient’ test statis- 
tic for A 6—¢ 0 would be the maximum-likelihood estimate 

A=6-¢ 


for which the asymptotic distribution is normal with mean A and variance 
> . . 1 
[NACL — A)(inf,,)] , where 


(5.11) infy, = [ Voto dx, 
J fo(x) 


providing the above integral exists. The relative efficiency of our nonparametric 
test based on the test statistic 7 with a specified normalized” J to the A test is 


— : : lie 

(5.5a) Er 3(Fo) = - = = 
inl 

where 

(5.6a) ly, = [ srosie) dx. 


It can be shown that the best J in the sense that it maximizes /’'7 4(F») is given 
by 


’ 


-_ F o( M 


” j 
(5.12) J(u) = ¥ 
fo(x) 


(infe ) 

’ There seems to be no clear-cut statement in the literature which would establish the 
test based on 4 as an efficient test invariant under the same translation of the X; and Y, 
The authors wish to thank the referee who pointed out the following elegant proof. The 
efficacy of the 9 — ¢test isA(1 — A) inf, , where infy, is the information of Fy . No invariant 
test of A = Ay vs. A = Ocan have greater efficacy than the likelihood ratio test for testing 
A = Ay vs. A = 0 when the densities of X and ¥ are fo(x + (1 A)A) and fo(z — AA). A 
standard calculation gives this test efficacy A(1 — A) info . Thus our 6 — ¢ test is efficient 

‘Let J be normalized so that fJ(u) du = 0 and fJ2(u) du = 1 
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where u = F (x). In fact for this J, we have 
le, = —(infp)-? [ BS _ er | (x) de = (infr,)"’ 
J LJo(r) J o(x) fo(.v) 
snd 
Kr4 l. 
As it is to be expected, if / &(N(O, 1)), the corresponding ./ Jo, the 


inverse of &. The problem of comparing the nonparametric with the parametric 
procedures designed for /y when F and G are translates of ¥ # Fo is hindered 
by our ignorance of the behavior of the parametric procedure when ¥ # Fy. 


6. Orientation and applications. 

6.A. Orientation. In Fraser’s book [5] it is shown that the ¢)-test has a limiting 
normal distribution for normal alternatives. We have now shown this to be the 
case for all alternatives (if we include the cases where Noy 0 or Nox > 0 
as degenerate cases). Hoeffding’s U-statisties include many nonparametric test 
statistics and he, Lehmann, and Dwass have shown that (-statisties are asymp- 
totically normal under the alternative hypothesis. The ("-statisties do not in- 
clude all statistics of the form 

\ 
(3.1) miy = > Ey izs 
1 
In particular ¢; is not a (7-statistic. Dwass’s results [3], summarized in Theorem 
4, appear to be the only useful results for statisties of the form (3.1) under gen- 
eral alternative hypotheses. 

THEOREM 4. Suppose 

(1) The conditions of the first paragraph of our Section 3 hold (Dwass has 
written to us indicating that it is sufficient to have m and n approach = ); 


(2) The polynomial 


ile 
P(t) = Lt 
| 
is non-degenerate, 7.€., 
max(!b,!,-:- , |b, > &: 


(3) (Xi, °°: , Xm, V1,°°*, Va) = (Ui, --- , Un) and R; 18 the number of 
Us less than or equal to U 


ay (n/mN)'*, 2 lL. ++*. me, 
(4) ay 
ay —(m/nN)'”, z:=m+i,--: 
N 
(5) ty = > ay; P(Ri/N); 


1 
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; ty — E(ty) p* e23/2 
No Ory Jo WV 2T 


then 


First note 


Eo) GACY +6 Dekel 
\ vrs((C | ” (, mn) ) Pa 


where in Ty wehave /y Pa N). Thus there tsa non-stochastie linear relation- 
ship between fy and 7, . Hence, from the statistical viewpoint ty is equivalent 
to 7T'y , a statistic of the form (3.1). Now let us compare Dwass’s conditions 
with ours 

1) Requiring Ay to be bounded away from 0 and | seems to be essential in 
oul Theorem l. 

2) The condition By, = Jy(i/N PSN Doin be(i/N)* is much 


stronger than our condition 4 in Theorem 1 in two respects: We only require that 


J y( have a limit and the limit need not be a polynomial in 2. Of particular 
Importance we do not require SI (a to be bounded on 0 < a < 1. The require 
ment max (. b . , 16 > 0 is to insure that Fy, % O, a trivial case which 


causes no difficulty 

6.B. Applications 

Example 1: Let Ey, > j '. Then Savage has proved [10] that Ty has a 
limiting Gaussian distribution under the hypothesis and is the test statistic for 
the locally most powerful rank test of 4, 6# against the alternative 6; ~ 4. 
where F(a e and G(a ee, —xo <2S50 and F(x Ger l, 
r > O. In order to verify that Ty has a limiting Gaussian distribution under the 
alternative hypothesis let us check the conditions of Theorem 1. To do so we 
note that Jx(7/N) is the expected value of the 7th smallest observation of a 
sample from the exponential distribution and that Theorem 2 is applicable 
Henee Ty is asymptotically normal in all eases 


Evample 2: Van der Waerden [12] has developed the theory of the test statistic 


a > /NH,(z)\ ... 
ly = | J (% : 2) a2), 


where J is the inverse of the normal N(O, 1) cumulative distribution. It can be 
shown that 


NH y(2) ks re l 
I J ( v4 2) J(Hx(x)) | dH x(x) = o(< 5). 


Then conditions 2 and 3 of Theorem 1 are established and the asymptotie nor 
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mality and efficiency properties for this statistic are verified to be the same as 
those of the ¢)-statistic. 


7. Higher order terms. In proving that the C terms of Theorem 1 are uni- 
formly of higher order the following elementary results are used repeatedly. 
7.A. Elementary results. 


1. H = AxF = dF. 
2. H = (1 — Aw)G = NG. 
1-H 1-H 


3. 1-Fs < 
An No 
tj i-og t= 2 e¢i-*. 
~ |L—dAy Xo 
5 FI yy SU H) _ HU — H) 
0. = = 2 = 2 . 
H(1 — H) 
6. G1—-—G)s ; 
No 


7. dH => Xx dF > Hh dF. 
8. dH >(1- Aw) dG = No dG. 


9. Let (ay , by) be the interval Sy. , where 


dol 
(7.1) Swe = 4 a:H(1 — H) > ™") 
Then y. can be chosen independently of F, G and dy so that 
(7.2) P{X, ¢ Sy, VY; ¢ Sue,t = 1,2,---,m,j = 1,2,---,n} Dl —e. 


10. / J (H(x)) dF (x) is finite. 


Proor. Using assumption 4 of Theorem 1 and A.7 
x al 


[ J(H(2)) dF) < K{ (HQ — my) * aH 


+1 
dH 
kK | — s K. 
-o (HQ — H)} 
7.B. Detailed consideration of the second order terms of Theorem 1. We are now 
ready to show that the C terms are uniformly of higher order. We begin with 
Ciy and prove the following identity: 


x - 


Cin = An (F,, — F)J'(A) d(F,,(x2) — F(x)) 


s = || J'(H) d(F. — F) + : | J'(H) wa 


(7.4) 


m 





NONPARAMETRIC TEST STATISTICS 987 


Let R be the set of points of increase of F,, . Then the right-hand side of the iden- 


tity becomes 


cheese, IH) Fm — FP + LE n(x) * | 
~ R t=] 


“R 


= laf J'(H)(F,, — F) d(F,, — F) + > J'(H(X))) 


{( — F(X) y-(' Sed Fx) | 4! > J'(A(X)) : | 
m ae | m 


= laf, J'(H)(F,» — F) dt a F) + DIO X,)) 


) P m 
[2 [i -rao]- 2] + Sra 3] 
mim m fond m* 
= rvf (F,, — F)J'(H) d(F,, — F). 
Using this identity we integrate by parts and obtain 


sia ‘ r ' ' ' 
(4.0) Cw = — = (Cun + Crow — Crsy), 


) 
where 


(F,, — F)J"(H) dl, 
J” (H) dH, 
' ee a 4 
Cun = J'(H(2)) dF (2 
m 


= ly (H(XD), 
Mm- jm] 


where Sy, was defined in 7.4.9. 
Now let us consider the random variable C),y . We find 


&|Cuv| = 84] (Fm — F)?\J”(H) | dH} [ a tae PY 


“SNe “SNe NXy 


Now using assumption 4 of Theorem 1 and 7.A.5 we obtain 
ec vi H(i — H) dH 
oe fH(L — HD 


I 


H' 3 dH 
Vv 
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Now using the Markoff inequality ((2], p. 182), 
K N’” K 


Pr (/Cuw| > aN") 3 yi a aN®’ 
where A may depend on e. Now consider Cipy . 
Let H, = H(ay), Hz = H(by) as in 7.A.9. Then H, 1— H. < K/N. 
With probability greater than 1 — ¢« we have 
; ~Hy 1 
Cry = | (F,, — F)°J”"(H) dH = | (1 — F) J” (H) dH 


“Sye “0 


F*J”(H) dH + | 
“He 


aa H> dH * (a — H)’ dH 
Cun |S — - 
- 7 | (H(1 — H))i ” . (H(1 — H))*” 


WA 
~ 
= 


dH <= KN’ 


Hence Cyy + Cry which does not involve ¢ is 0,(N~*). Now to complete the 


} 


study of Ciy we investigate Cig» : 


Cu, = > J’ [H(X))] = - lH(X)(Q — H(X))P* 
m i m? ios 
We may assume 6 < } or 6 < } without loss of generality. Then using 7.4.5 
Cuw 4. > [F(X) [1 — FCXy|) 
N m 1 


which is distribution free. By a theorem of Marcinkiewicz ({8], pp. 242-243) 
if a random variable ¥ has rth order moment finite (0 < r < 1), then the sum 
of N independent observations on Y is 0,(N""). If X has edf F, 


(F(X)l — F(X)! 


has a finite moment of order 2/(3 — 6) and hence 


Consequently Ciy = 0,(N 
Next consider 


7.6) Cw = (1— dx) | G. — @J'(H) a[F.(x) — F(x)). 
We hav e 

Ces (1 — Aw)M(Cuay + Cas 
where 


Cax = | ‘(G, — G)J’(H) d\ F,,(x) —F x)}, 


(G. — G)J'(H) a[F.(z) — F(2)]. 


os 
4 
— 





NONPARAMETRIC THST STATISTICS 989 
With probability greater than 1 — ¢, there are no observations in Sy, and 


; . wr 
Cayw| SA H(i — AH — A)\** dH(2) < Kk (%) . 
Since the two samples are independent and &(G, -- G) = 0, we have 


ELECwwy | Xi, Xe, 


-d|F n(x) — F(z)] d|F,,(y) —F(y)|, 


| G(x){1 _ G(x)|{J’[H(x)|}° dF,,.(2x), 


nm Jsy, 


mn 


— I] G(x)|1 — G(y)|J’|H(2)|J’[H(y)| dF (a) dF(y)' 
i . 


Ne 


< > [J H(x)[1 — H(y)| | JH (@)\J'1H(y)) | dH (a) dy) 


| (1-2) *y'"*( — y)** drdy vs 


GA — G)(J'TA))° : H( — H)\*” dH(z 


Henee 
(Cow |X, X2, 
vhere AK may depend on ¢ and 
a 
since 
Piles > CBlCee | Xe. ---. X.)) < 1a. 


( onsequently Css 1 — Av (Cay + Cay) which does not involve e, satisfies 


s integrand has alread) 
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Now consider 


Cw = | [Hy(x) — H(x))'J" eH x(x) + (1 — ¢)H(z)] dF,.(z), 
7.7) “0<Hy(z) <1 


Ode < lL. 


With probability greater than 1 — e, the range of integration 0 < Hx(r) < 1 
can be replaced by Sx, without changing Cjy . Since 


= ©) H(z) | _ 
i oP, | Hate) | ~ 20) 
and 

(7.8b) sup 1 — Ale) = O,(1), 


Hnw<t H x(x) 


for each « > 0, there is an n > 0such that with probability greater than 1 — «, 
we have for 0 < Hy(x) < 1, 


(7.9) (eHy + (1 — ¢)H][l — (Hy + (1 — ¢)H)] > nt A(z) — (2). 


Then 


Cw\s | (Hx(z) — A(x) PQ?) (at — Ay} aP,(z) = (9) "Cay. 


a : (1 — F)(Q1 — 2F) 
&( | Cuy <x/ law Fe — F) peneenclagtgereenen 
BLN /2 ie ] + N 
+ (1—aAyGi — 6) | _ H)\** dF(z) 
se] wa-mran+e [wa - mr ar 
< Kn.” + Kn, 
~ NR Nw 
Consequently 
Cay = o,(N ys 
The Cy term vanishes unless the greatest of the N = m + n observations 


is an X. In that case 


(7.10) Cw = —{ —J{H(X,.)] — (1 — H(X,,)]J1H(X,,)]}. 


m 
Using 7.4.9, however, 


a (vv }—3+6 *\—HS © 
1s acx,)) | < Xe = HI Sn) 


m m - Niw 
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with probability at least 1 — «. Hence 


1 SH(X,)] = 0,(N7). 


m 
Similarly 
| — H(Xw)WH(Xw)) |< = HO acy yy — Wx) 
mw m 
[(H(X,,)\[1 — HC(X,,)]}" yy—1/2 —1/2 

ee  scbend i — «= ¢g N 0 ( = ¢ (N ° 

= mH (X m) 7 nies 1) 7 ) 
Hence 

Cur = 0,(N-*”). 


The neglibility of Csy and Cey follows immediately from Assumptions 2 and 3 
of Theorem 1. 


7.C. Proof of Theorem 2. First we note that 
° el 
= i , | , ; 
(7.11) Jy 3) = Ex. = | J (u)giw(u) du, 
4 /0 
where 


i \ ; —1 \ (N-1) 
7.12) g.0(u) = —_—_—. —_—— u" (1 — «)”” 
(i@—1)!(N —1)! 
is the density of the 7th order statistic from the uniform distribution on [0, 1] 
and incidentally has mean i/(N + 1) and variance i(N — i+ 1)/[(N + 1)° 


N + 2)]. Then we have 


1 
Eva| S&S KN [ full — u)]) 1 — u)*" du 


7.15 
= KNT(N 2 i ar + 8) < KN} + 
r(N + 28) 
By a symmetric argument the desired result Jx(1) = o(N*”’) follows. Further- 


more we hav eS 


- | | Per oe Va ean 
(7.14) Js(3) -— 4 (3) < KN +K{ i (1-2) < KN 


‘ 


Before proceeding to bound Jy(t/N) — J(i/N) for 1 <7 <= N/2 we apply 
the Stirling formula 


log x! = log T(x + 1) 


(7.15) 


j 


I 6 
= Slog2e —r+(x+5)logr+° 0<6<1, 
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with a rather standard argument to obtain for 1 < i S$ N/2,0 <us 
(N — 1), 

v2 (N—1) 
; . ( mao - [tne i ] ., 
(7.16) és u J = iene ———- ¢ . 
_— Gin) 2 4 2r(t — 1)(N — 2) t 5 


where 


(7.17 = (N — lju — (i — 1). 


ol . 
I EG —J (+)| Gi v(u) du 


= Du + Dy + Dan + Dn + Di + Dy, 


For l 


where 


Dy - | J (udgi.w (u) du, 


Dn _ -f is ‘) gi.w (u) du, 


l 


Dy =: } =) J” (u*)giw (u) du, 


u* N, and 
1 — a)! 
(i — 1)! 
Ss Ku"N"gi-..n—a(u), 
where a + — 6and we assume 6 < } and thus a > 0 without loss of general- 
ity. Let @ be the normal edf. Then 


Dui | K[u(l — u)] "Ku"N"gi-a.w—a (u) du S KN* 


- 


| (u es V 


L a/ 
Du! < KN"@(—" 
: AK 


K represents a generic constant independent of 7, NV. Ay . F. and G. This equation is 
related to the asymptotic normality of order statistics and is derived by an operation similar 


the direct proof of the asymptotic normality of the binomial distributior 
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‘2 has the 


= ginv(1 — u) for] <i S N/2 andO Ss u S 1/2, | Dye 


Since giw(u) 2 
1i-—] eta 3/2 fe 
ot Vv :) (N — 1) KN"@ ( rt ‘) 


same bound as | Dy |. Similarly 
V(i — 1)(N — 1) 


(7.21) |Da\s 


and | De | has the same bound too. Since the expectation of the 7th order statistic 


from the uniform distribution is 7/(N + 1), 


D; = -' (4). | ( _- ‘Yo v (u) du 
; 


el 
L 
‘ae .( > 
+ / a (u t) gin (u) du + xy Ea 


ginw(u) S Kh(l — u foru <u. 


; 
h(u) = to~- = 
N 


Hence 


Finally 


< Ku" | (1 _ .) gi.x (u) du, 


7 ee + 1) 
D, s Ku; V + 1)(N + 2) (x . 


=t+ 


7 92) p00 | ot 2. ae 
(7.23) Dy s Ku; |* y +K | s-y 8 
Thus, for 1 < i < N/2, 

(7.24) 


and 
[Jx(Hx) — J(Hy)] dF x | 
| 


F N/2 
ss ilKkwe + ¥ KN" | (- 
mi t=<2 \ 


s KN** 
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. p . a —(1+ . 
since >02.10(—+~/i/K) and D2, ©" converge. By a symmetric argument 
we can cover the range V/2 < NF,, <= N and our theorem follows. 
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A HIGH DIMENSIONAL TWO SAMPLE SIGNIFICANCE TEST' 


By A. P. DempsTER 
Bell Telephone Laboratories’, Murray Hill, New Jersey 


0. Summary. The classical multivariate 2 sample significance test based on 
Hotelling’s T? is undefined when the number k of variables exceeds the number of 
within sample degrees of freedom available for estimation of variances and co- 
variances. Addition of an a priori Euclidean metric to the affine k-space assumed 
by the classical method leads to an alternative approach to the same problem. A 
test statistic F which is the ratio of 2 mean square distances is proposed and 3 
methods of attaching a significance level to F are described. The third method is 
considered in detail and leads to a “non-exact” significance test where the null 
hypothesis distribution of F depends, in approximation, on a single unknown 
parameter r for which an estimate must be substituted. Approximate distribution 
theory leads to 2 independent estimates of r based on nearly sufficient statistics 
and these may be combined to yield a single estimate. A test of F nominally at 
the 5% level but based on an estimate of r rather than r itself has a true signifi- 
cance level which is a function of r. This function is investigated and shown to be 
quite near 5%. The sensitivity of the test to a parameter measuring statistical 
distance between population means is discussed and it is shown that arbitrarily 
small differences in each individual variable can result in a detectable overall 
difference provided the number of variables (or, more precisely, r) can be made 
sufficiently large. This sensitivity discussion has stated implications for the a 
priori choice of metric in k-space. Finally a geometrical description of the case of 
large r is presented. 


1. Introduction. The statistical problem here treated is that of significance 
testing for the difference of the means of 2 k-variate populations which may be 
assumed to have the same structure of variances and covariances, the test being 
based on a sample from each population with sample sizes denoted by m, and n . 
It is intended to provide a method applicable to data where the number k of 
characteristics measured on each individual is large but where the number of 
individuals measured may be quite small. The usual method of classical multi- 
variate statistics encounters a mathematical barrier and becomes inapplicable 
when k > n, + nz — 2, but certainly the need has arisen in applied statistical 
work for techniques handling small samples of highly described individuals. 

The classical method has 2 equivalent formulations in terms of the 7” statistic 
of Hotelling [2] or the best linear discriminator of Fisher [3]. For this method the 

teceived July 8, 1957; revised June 27, 1958. 

1 Most of the material presented here is also contained in the author’s PhD thesis [1] at 
Princeton University. His work in Princeton was supported principally by the National 
Research Council of Canada. 

2? Now at Harvard University 
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space of the k characteristics is thought of as k-dimensional affine space and 
needs no further structure: the method is invariant over the choice of any k 
linear combinations of full rank of the given k variables to be used in place of the 
given variables. The 2 populations are assumed to be probability distributions 
over affine k-space and the samples constitute m; + 72 points of this space. In the 
formulation of [3] the sample points are projected along a family of parallel 
(k — 1)-dimensional hyperplanes onto any line, the family being chosen so that 
the one-dimensional Student’s ¢ for the 2 samples is maximized. This émax is then 
used to test the significance of the difference in population means. However, if 
k > nm + nm — 2, a family of (k — 1)-dimensional hyperplanes can be chosen 
which projects the points into 2 points, one for each sample. Then tmx = ~ 
regardless of the populations and so is useless as a test-statistic. In the formula- 
tion of [2] the samples are used to define a Euclidean metric in the affine k-space 
and the test-statistic is the distance between the 2 sample means in this metric 
This metric is based on the variation of the samples about their means, and if the 
samples are shifted to have a common mean point and k > n; + ne — 2 the 
variation spans only a subspace of mn; + mn. — 2 dimensions. Thus it is not sur- 
prising that in this case the method of defining the metric breaks down. Further- 
more it is heuristically evident that no metric for a whole affine space can be 
well-defined from variation taking place in a flat subspace. For these reasons we 
are forced to give up the classical approach with its elegant mathematical 
property of affine invariance. 

The approach of this paper is based on the observation that, whatever metric 


is chosen for k-space, the distance between sample means is a statistic which may 
yield evidence of separation of the populations, and, rather than be preoccupied 
with a choice of optimum metric from the data, we should try to use a metric 
determined apart from the data and analyze the information yielded through 
this metric. 


For much of the theory the population distributions will be assumed to be 
(multivariate) normal. 


2. The general method. It is assumed that a Euclidean metric has been 
assigned to the affine k-space of the k characteristics; that is, k independent 
linear combinations of the given variables have been chosen which define distance 
along k mutually orthogonal axes of Euclidean k-space. The metric may be 
thought of as chosen from a priori knowledge (precise or imprecise) of the joint 
distributions of the k characteristics, in the hope of roughly sphericalizing these 
distributions. More detailed remarks on the choice of a metric are to be found in 
section 5. 

Suppose that the 2 population distributions have means denoted by k X 1 
vectors v; and v, and common k X k matrix of variances and covariances de- 
noted by A. We are seeking evidence that vo = v; — v is different from zero 
and are naturally led to consider V» the vector joining the sample means. Vo i 
an unbiased estimate of ». Having a metric at hand we will try to direct a 
significance test at the detection of a non-zero length of vp and will use the 
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length of Vo in estimating this length. Rejection of the null hypothesis »» = 0 
will result from evidence that the length of Vo is significantly greater than zero. 

So far this use of Vo has been justified mostly on heuristic grounds. It makes 
sense geometrically. If however we assume that the populations are multivariate 
normal N(»,, A) and N(v., A) a more mathematical reason may be given. 
Suppose the n,; + nz individuals are regarded as defining a set of orthogonal axes 
“degree of freedom” (d.f.) space and any set of orthogonal axes defines a set of 
orthogonal d.f. Such a new set of d.f. may be defined as follows: first choose the 


in a Euclidean space of mn; + nz dimensions. The space may be regarded as 


d.f. measuring the grand mean of the n, + ne individuals, second choose the 
d.f. measuring the difference between the means of the 2 samples, and third 
choose any set of n; + nz — 2 df. which together with the first 2 form an orthog- 
onal set. This last set represents “within sample” d.f. Their number mn, + nz — 2 
will henceforth for convenience be denoted by m. The data, which consists of / 
points in this (n; + nz)-space, can be described by a set of n; + n2/ & 1 vectors 
corresponding to the new d.f. Let U» be the vector corresponding to the mean 
difference d.f. and U,, U2,--- , Um be the vectors corresponding to the within 
sample d.f. It can be easily checked that 


na (LaL) ws, 
ny Ne 


that U,, Uz, --- , Um have mean 0, and that Up, Ui, --- , U are uncorrelated 
and each have A for matrix of variances and covariances. Finally, assuming 


normality and defining 
Pe te 
é = (7 + — VO, 
ny Tie 


it is seen that Uy, U;,--- , U» are independent, the first being distributed as 
N(é, A) and the remainder as N(0, A). With the normality assumption it is clear 
that Up, U,,--- , Um» are sufficient for the parameters ») and A, for apart from 
an irrelevant overall translation of both samples the original data can be re- 
constructed. But since U> is the only one of these vectors involving the param- 
eter vo it is natural to choose a property of U> alone in testing significance. 

Three methods of testing whether or not U> is significantly long will be de- 
scribed, but only the third of these will be pursued. The first is the non-para- 
metric randomization test based on the method of Pitman and Welch [4, 5]. For 


. N, + Ne acted or s ; ‘ 
each of the ( : ) divisions of the n; + ne individuals into2 groups of n; and 
ny 


ne there is a corresponding d.f. for group difference and corresponding vector U. 
Under the null hypothesis that the n; + m2 individuals are a sample from one 
distribution the lengths of all these vectors U have a joint distribution sym- 
metric under permutation of the vectors. Accordingly U> is significantly long at 
level a if the length of U» is beyond the (1 — a) point of the sample cumulative 
Ni + Ne 


) lengths of vectors U. The second method is 
ni 


distribution of the set of ( 
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the continuous analogue of the first method which comes into play when normal 
distributions are assumed. Suppose a set of p d.f. are chosen independently at 
random uniformly with regard to direction in that part of (m, + me)-space 
orthogonal to the d.f. for the grand mean. The set of p corresponding vectors 
together with Uo have, under the null hypothesis of identical normally dis- 
tributed populations, joint distributions which are again symmetric under 
permutations so that a significance test may be defined as in the first method. 
The limiting test as p — © is uniquely defined and may be regarded as the con- 
tinuous analogue of the Pitman and Welch procedure. For k = 1 this amounts 
to the usual ¢ test, but for general k the distribution associated with the limit- 
ing test appears difficult to handle analytically. However the test could be ap- 
proximated using a suitable p and experimental sampling. 

The third method, which is the concern of most of the subsequent discussion, 
is also based on normal distribution theory. The idea here is to compare the 
length of U» directly against the lengths of U, , U2, --- , Um, since under the 
null hypothesis they form a sample of size m + 1 from a certain distribution. 
Define Q; = squared length of U,(i = 0, 1, --- , m) and 

iz 
F= Qo - ~ Q: 
Mm ii 
Then U» will be declared significantly long if F is significantly large. If the null 
hypothesis distribution of F involved no unknown parameters then an exact 
test could be based on F; since this is not the case a type of ‘“‘non-exact signifi- 
cance test’’ will be introduced. 


3. Distribution theory. The distributions involved in the non-exact significance 
test are those of properties of the vectors Uy), U;, Us, +--+, Um, in particular 
their lengths and angles between pairs of them. We suppose in this section 
normal distributions and so may deal with a typical vector U distributed as 
N(0, A) or a typical sample of such vectors. Under these assumptions Q, the 
squared length of U, has the distribution of a quadratic form in k normal vari- 
ables. Since this distribution in precise form involves k parameters, all unknown, 
we will rely on the well-known [6] approximation which treats Q as distributed 
as ux; depending only on 2 unknown parameters uv and r. The parameters » and 
r are generally fitted by equating the first 2 moments, and this results in the 
inequality r < k. 

This approximation, at least for integral r, corresponds to approximating the 
distribution of vector U by a spherical normal distribution lying in a flat sub- 
space of dimension r in k-space. Stated more precisely this says that in the 
metric chosen for k-space there is an orthogonal transformation to coordinates 
(y1, Y2,°** » Ye) Such that the distribution of U is defined by 


(i) density or exp (- = » i) for y1,Y2,°"* Yr, and 
“t)°~ “i 


i=l 


(li) Yria, Yra2, *** » Ye are zero with probability one. 
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Having this approximate underlying distribution for U it is possible to define 
from it approximate distributions for other statistics based on U. 

As a first example consider the angle 6 between a pair of vectors U and U’ 
independently distributed according to (i) and (ii) above. Due to the spherical 
symmetry of the distribution in r-space the conditional distribution of @ given 
L” does not depend on the particular U’ so that the distribution of @ is the dis- 
tribution of the angle between U and any fixed direction e.g., 


Thus cos’ @ is distributed as 


yi/(yi + Y2 + ++ + Yr) 


i.e. cos 6 has the 8 distribution 81/2,;--»/2 defined by density 
[{r 
(Z) 
») 
SP 
~f1\.fr-—! 
r(3)r(>) 
= a | 


«sf 


1 


This will be used as an approximation to the distribution of cos’ @ under the 
circumstances where ux; is used as an approximation to the distribution of Q. 

Accepting these approximations it is natural to attempt to estimate yu and r. 
In particular, estimation of r plays a significant role in our non-exact test. The 
distribution theory leading to estimates of r will now be discussed. The vectors 
V,, V2,--:, Vm may be described by the set of their lengths and the set of 
angles between pairs of them, and under the sphericalizing approximation 
these 2 sets of random variables are independent of one another. From each of 
these sets a statistic is defined which contains nearly all the information about 


r in the set and whose distribution may be approximated by a fast-converging 


limiting form as r— ~, namely [(1/r) + (c r’))xi. where c and h must be de- 
termined for each set. This leads to 2 independent estimates of r which may be 
combined into a single estimate. 

Taking the set of lengths define Q, as the squared length of _’; and consider 
Q;, --- , Qn as m independent observations from ux? with uw and r unknown. The 
results of this paragraph are found in [7]. The joint density function of Q, , 
Qo, --*, Quis 


(Je) (Hla) wo (- Ae) 


so that []7-1; and >-7_,.Q, are a pair of sufficient statistics for u and r. It is now 
natural to look at 


[7 Q: 


(>°7-1 Q.)” 
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as a Statistic not involving u for the purpose of estimating r. From joint charac- 
teristic functions v and }~7_,Q; are seen to be independent. Thus 

m m m 

v: $> Q) = IIa 

t=] t=l 
where the 2 factors on the left are independent as are the m factors on the right. 
Since the distributions of }7_.Q; and Q; are known this equation makes it pos- 
sible to immediately write down the moments of v about 0 or the cumulants and 
characteristic function of log v. In this way we approach the limiting x’ distribu- 
tion of log v as r — « and show that the power series expansion in terms of 
(1/r) of the cumulants of the actual and asymptotic distributions agree up to the 
terms in (1/r)*. This asymptotic distribution is stated in [7] to be remarkably 
good with agreement of the first 4 cumulants to within 5% when r is as small as 5. 

Asymptotic expansions for the cumulants may be derived as follows. Define 

t = —log(m™v), and K, as meaning sth cumulant. Then for any s 


K, (log v) + m'K, (og > a.) = mK, (log Q;), 


i=] jf 


K, (log v) + m'K, (log x2.,) = mK, (log x?). 


From [8] asymptotic formulas for the cumulants of log x%, are given by 


> 9 l = { xs s 3 
K, (log x2.) = logn —-- —2)>> i ; 
n j=l yn 
16 
63n8 


and 
2 ae (s— 2)! , (s ! , 2 G& (—4)7'B(2j + 8 — 1)! 
K, Gegxt) = (~yr | 2% 4 @= 0! , 2 (—4)’"B,(2j + s oy 


(2j) ne 


a (—)> 


where B; are Bernoulli numbers. Thence 
K,(t) —m log m — K;, (log v) 
—m log m — mK, (log x;) + mK, (log xnr) 


14 afredode 
1 m m 


/ \ m= 
-n  aS ? = aad 
, 


or L5r4 
and for s = 2 


K,(t) = (—)°K, (log v) (—)*[mK, (log x3) — m'K, (log xinr)] 


= 2s — 1)!(m 1) 
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‘: 2 ——— , — 2 
Since x;,_; has cumulants K, = 2°"(s — 1)! (m — 1) it is seen that t ~ (1/r)xn—1 
with agreement in first terms of the expansions, and 


1 
1+ 
1 mj 2 
~.? rees Xm—1 
r or 
with agreement in the first 2 terms, for all cumulants. 
Thus r may be estimated by 7 defined by 


1+ : 
4+ —_~ } (m — 1) 
ar 


l 


\r 


and for r moderately large the distribution x,_; can be used to put confidence 
limits on r. 

Consider next the set of 4m(m — 1) angles among U,, U2,---, Um. Set 
n = 4m(m — 1) and denote by S; (¢ = 1, 2, --- , nm) the squared sines of these 
angles. Under the approximate model any S; is considered distributed as 8;-_1)2.12, 
but as a further consequence of spherical symmetry in r-space it may be noted 
that any set of angles containing no closed subset is a mutually independently 
distributed set, and in particular the angles are pairwise independent. Extending 


this approximation to complete independence the joint density of the S; becomes 


r() 


II (so [Ia - 8 
=] = 


-,~(r—V 
vay) 
so that [[i-:5. or Dent logS; appear as equivalent sufficient statistics for r, so 
contain approximately all the information about r in the directional properties. 

This leads to a consideration of —log By—1/2,1/2 . The density of 1 — 8-121 
is easily seen to be asymptotically as r — the density of 1/r xi and since 
B¢r—1)/2.172 — 1 in probability as r — ~ it follows that 


—log Ber—1) 2.1/2 
l =x 8 r—1)/2,1/2 


— | 


in probability as r—> ~ so that —log 8:,_1)/2,1/2 is also asymptotically distributed 
as l/r xi , 

Direct asymptotic expansions for the cumulants of —log 8;-1)/2,1/2 show that, 
as with statistic ¢, this last asymptotic distribution can be modified to have 
agreement in the first 2 terms. For, since x2_) = B—»a. xX; With independence 
on the right (as may be seen by computing the characteristic functions of the 
logs of these random variables), 


K, (—log 8¢—1)/2,1/2) (—)’K, (log 8 ¢—1) /2,1/2)) 


(—)*[K, (log x71) — K, (log x?)] 
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for all s. Thence 


Ky, (—log 8(-—1/2,12) = log r + | - 


~ tog - 9 =| =; 


and for s = 2 


r 2 r (s - 2)! 
K, (—log B¢—n2a2) = 2 | =— 
a(r — 1) 


so that 
—log p r—1 


with agreement to second terms in the expansions and therefore usable accuracy 
for quite small r. 

Now we may regard 

—)> log Si ~ ( + 4) Xn 
t=1 r Tr 

and obtain a new estimate of r. Since in approximation the angles were more than 
pairwise independent the first 2 moments of this last are asymptotically faithful 
to the approximate model. The remaining moments however will be distorted 
slightly on account of non-independence in a way which is difficult to investigate. 

Finally an estimate of r can be obtained from t — }>?.; log S; regarded as 
asymptotically (1/r) x14. or an appropriate refinement for small r. 


4. The non-exact significance test. The question is discussed here of what can 
be had in the way of a significance test based on F = Qo/1/m}—7_; Q; considered 
s F, m- under the null hypothesis where r is unknown but estimated from a 
statistic w considered distributed as f(r)x>, independent of F with f(r) equal to 
1/r or an asymptotically equivalent refinement of 1/r. The point estimate of r 
found from the equation w = f(r)- n will be denoted by 7 and the term 100p% 
confidence point of r will indicate the value of r satisfying w = f(r)xncp) where 
Xn p) denotes the 100p% point of x, . Similar notation will be used for percentage 
points of other distributions. 
A statistical test may be termed exact if the distribution of the test statistic 
under the null hypothesis does not depend on any unknown parameters. If r 
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were known the statistic F would have this property and the natural test would 
be to regard F as significant if F > F,.mrc.95). (Assume for this discussion a 
standard 5% nominal significance level.) Since r is unknown any test based on 
F must be non-exact and the natural non-exact test appears to be to regard F 
as significant if F > Fs m:95). This test can also be formulated in terms of 
quantities a and &. Define a as the significance level of the observed F as a func- 
tion of the true parameter r, i.e. a satisfies F = F, mra—a) . Similarly & as a func- 
tion of the observed statistics F and w can be determined from F = F;.maq—# - 
The unattainable exact test is that F is significant if a < .05; the non-exact test 
defined is that F is significant if &@ < .05. 

The non-exact test still has a significance level (or size or probability of type 
I error) but this is now a function of r. Denoting this function by y(r) we have 


y(r) Pr(& < .05) 


= Pr(F > Fi mé(.96)) 


= ave{Pr(F > Fei). mri w)} 
u 


where F is distributed as F, »,. The last version of y(r) indicates how y(r) can 
be calculated for given r i.e. by averaging a set of fairly well tabled probabilities 
over a x distribution. The major interest of this section is to determine the rela- 
tion between y(r) and the nominal significance level .05. 

The distributions of a and & can be compared by fixing a and looking at the 
variability of the corresponding &. This amounts to conditioning the various 
random variables by fixing F to produce the desired a, but leaving w uncondi- 
tioned. For any fixed a, if r is known, percentage points of w can be translated 
into percentage points of 7 and thence to percentage points of &. These are de- 
noted (& | a);»). Alternatively, for fixed a, r unknown, but w observed, con- 
fidence points for r can be translated into confidence points for a and these will 
also indicate how much 4 varies about a. 

Short of actually calculating y(r) for various values of r, m, n and .05, two argu- 
ments will be advanced to show that it is near .05. The first argument is to use 
a table to back up the belief that the disturbance caused by going from a to & 
is not very great relative to the (0, 1) range of a and is well balanced with regard 
to direction, so that the unconditional distribution of & is not much different 
from the uniform (0, 1) distribution of a. Table 1 shows quartiles of (4 | a) for 
m = 10,n = 64,a@ = .05 and .10, andr = 6and ~. This table indicates that the 


TABLE 1 


a 


.050 
.100 
.050 
.100 
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disturbance in a caused by using 4& is well-balanced near the 5% level and is a 
slight shift towards 0 near 10% . The indication is that y(r) is very near .05. 
The second argument involves computing the non-trivial limit y(@) 


lim,.« y(r). Define 


(Fo aae — a - a? l S 0 


— 1. otherwise. 


-1)};~0 


each with probability }. Similarly if (1/7) = (1/rn)x;, is put in for 1/r, 


2 2 
n°*X 


9 6 2 
((F¢mt— 1) ~O or 2(1++) xi =?( i 
r m T n 


each with probability 4 where x, and xi are independent. From this 


y(o) = lim Pr (Fy ime > F¢ nt¢.98)) 


ren 


= lim Pr (((Frime —1)*F > [(F ime — 1)*Fc05)) 


r>e 
1 r 2 l 2 27 
= 3Pr(xi> [xn Xi) .90)) 
and var {xj} 


sf 


] 2 2 
and var<{ —xn‘x1? =: 
nm 


1 ; 
ave< —x,° 
\ n 


which indicates strongly that 


bs. : 
ext - xi]. > [xi] 
n .90 (.90 


i.e. y(~) < .05 so that asymptotically the test is conservative as r gets large 


Since the foregoing table indicates smaller spread in (4 | a) for finite r than for 


r = x we might hope that 7(r) is as near .05 as y(~) is. 
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In any particular case the spread in (& | @) can be examined by computing 
confidence points of a from confidence points of r. 

One feature of this test which might be regarded as a practical drawback is 
the non-uniqueness of the vectors U, , U., --- , Um. These vectors resulted from 
a choice of an orthogonal set of m d.f. chosen arbitrarily in a space of m = nm + 
n. — 2 dimensions. The symmetry of the normal distribution over any choice of 
an orthogonal set of d.f. assures that the distribution theory of the test holds 
for any such set, but there is no assurance that the observed statistics are un- 
changed by different choices. In fact it can be easily seen that 7 1 Q; is in- 
variant under all choices so that F = Qo/(1/m) ) a 1Q; is also. Thus it is only in 
the estimation of r that variations occur, and, since we have heuristic evidence 
of having used almost all the information about r in our estimates 7 and since the 
* plays only a secondary role, the non-uniqueness of the significance test should 
be of mimor importance 


5. The sensitivity of the test. A natural parameter measuring separation of the 
2 populations is distance between their means in a metrie defined as follows 
from their second order moments. Suppose that the metric inserted into affine 
k-space from a priori information and used heretofore is denoted by G; , and 
suppose that the ellipsoid in affine space which appears as the unit sphere in G, 
is denoted by £, . If, in an affine coordinate system for /-space a sample point is 
represented by k & 1 vector u and the corresponding k &K k matrix of variances 
and covariances is A, then an ellipsoid FE, can be defined as u’A ‘u = 1. It is 
easily seen that the same ellipsoid is defined by the same prescription in any 
affine coordinate system so that given the distribution over affine space FE» is 
uniquely defined. E. may now be used to define a Euclidean metric G, in affine 
space as that metric in which /£, appears to be the unit sphere (so that the dis- 
tribution is sphericalized). Suppose the G.-distance between population means 
is r i.e. vo has G,-length +. Then 7, which is also the ratio of mean difference to 
standard deviation for that linear combination of original variables which 
maximizes this ratio, may be taken as the parameter measuring difference be- 
tween population means, and we would like to know if our test is sensitive to 
large 7 

Now ave! Vo! vo With G.-length r soave}l»| = & (1/m, + 1/2) = with 
G2-length (1/n,; + 1/m)~°r = 1 say. Denote by Qo() the squared G,-length of 
Us, by Po(é) the squared G.-length of Uy and by R(U,) the squared ratio of the 
radius of E, to the radius of E. both radii in the direction of Uy) . (This ratio of 
Jengths in one direction is independent of the particular metric.) Then 


(5.1 Qo(E) = R(Uo)Po(E). 


Assuming normality the distributions appear in Ge as spherical unit normals so 


that Po(£) has the non-central x° distribution xi(7;) defined as the distribution of 
by + my) + v2 + +++ + ou where x, v2, --- , % are NID(O, 1). Unfortunately 


R(U >) has a distribution depending on the direction of & as well as its length. 
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This may be contrasted with the statistic T° usable if k < m, which is non- 
centrally distributed as 


pe — mx (oy) 
Xm—k+1 
with the numerator independent of the denominator [9] which distribution de- 
pends only on 7 and on no other properties of ¢. The present situation has the un- 
desirable feature that the G,-metric may have been selected in such a way that the 
G,-length of £ is too small to cause a significant disturbance in Qo whereas 7; is 
large enough to cause a significant disturbance in x:(7,). An extreme example of 
this would occur if the populations were not of full rank but lay in separate paral- 
lel hyperplanes but still very close in G,. Here r = ~© but Q)() could very well 
be little disturbed. As long as é is regarded as having non-random direction and 
G; cannot be chosen to coincide with G, there is danger of insensitivity to a large 
7 arising from this source. On the other hand if it is permissible to assume random- 
ness for — then this danger can be controlled on the average, and further discus- 
sion proceeds along these lines. 

The high-dimensional case is likely to arise in practice when little or nothing 
is known about the separating power of individual variables. If nothing is sup- 
posed known it may be reasonable to think of — as random with all directions 
intuitively equally likely. The only affine choice consistent with this intuitive 
notion is to make £ uniformly distributed with respect to G,-direction, so the 
first case considered will be where — has constant G.-length 7; and is uniformly 
distributed with respect to G,-direction independently of the within sample 
variation. 

Under this assumption and normality U(é) = — + U»(0) where the 2 vectors 
on the right are independent each with directions distributed uniformly in G, . 
Also, due to the G.-spherical symmetry of the distribution of Uo(0), the Ge-length 
of U,(0) is distributed independently of its G.-direction. It follows that U’o(£) 
has independently distributed G2-length and G--direction, so that in the equation 
(5.1) the 2 terms on the right are independent. Also the distribution of R(U») 
is independent of é. In our standard approximation of Qo(0) by ux? by fitting first 
2 moments we have ave{Q(0)} = ur and ave{Qi(0)} = wr(r + 2). Also 
ave{P,(0)} = avefxz} = k and ave{P3(0)} = k(k + 2). Thus 


ei ave |Q,(0)} r 
ave {R(U,)} = dol . a & 
ave |P,(0)! k 


ave {Q,(0)} ma yw r(r + 2) 
ave | P;(0)} k(k + 2 


ave {R°(U>)} = 


so that 


ave {Qo(t)} = ave {R(U,)} - ave {xi(rs)} = ur (1 a), 
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ave {Qj(é)} = ave {R°(U,)| ave {xi(r:)} 
Ti 


2 4 
= “r(r .. 2 a. © “a —etinianatiils 
wry +2) (1 t 27 + igi) 


, : k-—rnr 
var {Qo(é)} = ur (1 _ + —;,}. 
b k+2k 


The distribution of Qo(é) is clearly not non-central x° since the variance of the 

latter does not involve a term in 7} . For practical purposes it would be reasonable 
» ? . . . ° dial » ° 2 

to fit a x shape to this distribution by fitting first 2 moments, i.e. Ax, where 


“ ri k-—rri 
eo; ter ee 


, 


Now it is possible to compute approximate “power functions” and “confidence 
limits” for + by assuming Ff for r > O approximately distributed as A/p Fy. m+ 
and by adopting the procedure used with significance testing of replacing r by 7. 
These ‘‘power functions” and “confidence limits” are actually estimates of the 
true power functions and confidence limits associated with the non-exact test 
just as @ was an estimate of a. The deviation of the estimated power from the 
true power may again be expected to be near zero and balanced about zero. 
Using confidence points of r confidence points for any particular value of the 
power function may be found and these will indicate the order of the disturbance 
caused by replacing r by 7. 

For convenience a criterion different from the power function will be used to 
measure the sensitivity of the test, namely +, the value of + which will produce 
on the average a barely significant test statistic. Regarding (1/m) > Q;, the 
denominator of F, as (u/m) x... 


\ 


2 


. ‘m if 
ave {F} ave {Qo(&)} - ave<— + xmr? 
\# 4 


2 
2 3 mi) 1 mr 
- ¢ +7) ur mr — 2 
fl 2 
(1-2) +7) 
mr k 
+ ‘y 
Ne i 


so that 7, satisfies 
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or, since, for large r, Fme ~ N(1, (2/r)[1 + (1/m)]), +. asymptotically satisfies 
ein. . (2) 
1 + (- + =) ; =1+ 1.65 ( ) ( + 4 
ny, ny Ike r m 


BY te 


ig’ 


tite nd 


where N = 1.65(2)’(1 + (1/m))’ = 2.3. Note that for a given experiment r”° is 
the only factor in +: which depends on G, . This result is encouraging, for suppose 
we have a set of variables with equal but possibly small individual separation 
parameters p. If the within sample variation is independent from variable to 
variable then + = kp. Thus if G; can be chosen such that 

N {1 l ) 

p* (~ 7 Ne 
then separation would show on the average. This implies that regardless of how 
small p is we need only go on adding variables of separation p until r has been 
built up to correct size. Whether it is possible to continue indefinitely adding 
variables with small separations in a practical case is uncertain, but the example 
does show how small individual separations can produce something that will 
show. 

If there is some feeling that — is not uniformly distributed relative to G.-direc- 
tion an alternative would be to suppose it uniform relative to a different metric 
G; with ellipsoid FE; , i.e. when E; appears as the unit sphere £ appears of length 
o and uniform with regard to direction independent of Qo(0). Then a priori know!- 
edge of the separating powers of the variables could be supposed to consist of 
some information about FE; . Suppose the mean square G,-length of — is A’o 
where A~ depends only on £; , and suppose the mean square G,-length of the 
centrally distributed Uy is B’ = ur = ave{Qo(0)}. Then 


ave {Q(t)} = B + A’o = ur (: + a “') 


so that o- producing significance on the average is given, in the asymptotic case by 


en ee 
oe (— + *) Be" 


where now the choice of G,; can influence both B/A and r 

We are now in a position to discuss theoretical issues concerning the original 
choice of metric G, . These suggest that for most purposes the aim should be to 
make G, and G; coincide as nearly as possible except for a scale factor. The prac- 
tical question of how well this can be accomplished is not discussed, nor is it 
crucial for the use of the method. There are 2 issues in the choice of G; : sensitivity 
of the test and safety of the assumptions. 
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If G, is related to G, by a scale factor, then the statistic Q is distributed as uxi 
i.e., 7 = k and under normality the approximation to the distribution of Q by 
ux; is exact. This is the way in which choice of G, can be made to improve the 
assumptions. It is heuristically evident that a larger value of r results in less 
likelihood that approximations of this kind will go wrong. 

tegarding sensitivity it can be seen that only when EF, is G,-spherical is there 
equal sensitivity to a separation of 7 in all directions and so no danger of in- 
sensitivity to large r. Also it has been seen that when the direction of is assumed 
G2-uniformly distributed 72 depends on r™ so again there is evidence that maxi- 
mizing r to k gives greatest sensitivity. However, under the alternative random- 
ness assumption of £ uniform over ellipsoid FE; the situation appears more com- 
plicated, for the factor in «2 to be minimized by choice of G, is A*/(B’) r*. This 
suggests that if something is known about the shape of F; as well as FE, then F, 
should be chosen to give more weight to those directions in which E£; is long rela- 
tive to E, provided this does not too greatly depress r. It is felt that this last 
suggestion may be occasionally useful but the general rule will be to try to make 
r= k. 


6. Asymptotic behavior. In the foregoing are many results asymptotically 
true as r— ~* with m fixed. Certainly these are a mathematical convenience. The 
question of whether indefinitely large r can be practically obtained remains open. 
Certainly if k can be made arbitrarily large and each of the k variables contains 
a part independent of the rest then in theory r can be made arbitrarily large be- 


cause a metric can be chosen such that r = k. What is much more in doubt is 
whether or not variables could be chosen which would give r indefinitely in- 
creasing and 7 also increasing at a rate such that the sensitivity of our method 
would continue to improve. 

Whether it is practically attainable or just mathematically useful the following 
geometrical picture of the asymptotic case is illuminating. Consider throughout 
the approximate model of section 3 and its asymptotic behavior. As r— ~ the 
coefficient of variation of Q (i.e. ux) tends to 0, so that if we back away from the 
picture at the correct rate as r increases the vectors LU’, , --- , U,, will appear to 
all approach in probability the same constant length. Also since 1 — S; ~ 1/(r) xj 
each angle between vectors tends in probability to 2/2 so they tend to an orthog- 
onal set of m equal length vectors. Vector U5 also becomes perpendicular to 
U,,---, U, but its length depends on 7. However if its length should differ 
from the common limiting lengths of the rest by a factor as great as (1 + Nr yi 
this is roughly what would be called significant, so that asymptotically a sig- 
nificant U» could be indistinguishably different from the rest. 

An implication of this asymptotic picture is as follows. For small r it would be 
natural to compare Q(£) from Uy) more closely with those Q; from U; making the 
smallest angles with WU , because if Uy and U; are close then R(Uo) and R(U;) 
are likely to be more nearly the same. 7° accomplishes this in a neat manner 
which disappears when k > m, but the present method makes no attempt to do 
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it. The asymptotic picture says that in the limit there is no hope of making such 
a correction, for if Uo is nearly at right angles with every U, then the radii of 
E,; and E£, in the direction of (> bear no relation to the radii in the direction of 
the U; , i.e. there are too many directions for Uy to take to hope that it will be 
near enough to any U; to make any difference. 


7. Acknowledgments. Heartiest thanks are due to Professor John W. Tukey 
of Princeton University for his generous guidance in this research. 
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ON THE KOLMOGOROV AND SMIRNOV LIMIT THEOREMS FOR 
DISCONTINUOUS DISTRIBUTION FUNCTIONS 


By Pavit ScHMip 
Swiss Forest Research Institute and Federal Institute of Technology 
1. Introduction. Let X,, X.,..., Xx be N independent random variables 
with the same distribution function F(z). Sy(x) is the empirical distribution 
function, i.e., Sy(z) = k/N if exactly k of the N values X; are less than or equal 


to x. It is of theoretical and practical interest to analyze the behavior of the 
statistics 


sup |Sx(z) — F(z)| - Nt 


—e<cz<ce 


sup (Sy(x) — F(x)) - Nt 


—x<z<w 


Kolmogorov [12] proved in a famous paper in 1933 that for \ > 0 


~2 


lim P[ sup !Sy(zx) — F(z)| - N<\J= > (—1)'> 


Now —w<cr<n kam 


if F(z) is a continuous distribution function. Smirnov [21] obtained a similar 
result in 1939, when he showed that 


II lim P[ sup (Sx(z) — F(z)) - Neva =-1-e™ 


N+ —ecrce 


holds for continuous distribution functions F(z). 

Kolmogorov converts in his proof tc a generalization of the Central Limit 
theorem, whereas Smirnov’s theorem was a corollary to a more intricate theorem. 
But the two formulae can be proved by reciprocal methods. They have also been 
proved by Feller [11] and by Doob [10] and Donsker [9]. Feller made use of 
characteristic functions and Doob employed stochastic processes. Smirnov 
[22] found in 1944 the first terms of the asymptotic expansion for the probability 
in II and an exact formula for finite VN. Chung [7] and Blackman [5], [6] were 
successful in finding the asymptotic expansion for the probability in I. 

A somewhat more general form of the statistics, namely 

sup |Sy(x) — F(x)| N°’ - ¢(F(z)), 
—Z2<zt<e 
where ¢(y) is a positive definite weight function, was discussed by Anderson and 
Darling [1]. They found the limit distributions for some special weight functions, 
Received October 11, 1957; revised June 20, 1958. 
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by means of stochastic processes. Similar results were obtained by Maniya [16] 
and Malmquist [15]. Rényi [19], in 1953, established the relations 


a } ~ ae 
tim P| sup | Sx(z) F(z) Nie | 





Nox as F(z) | F(z) 
- ne | ete - | 
-*F (pk 8? 
T k=l 2k > l 
and 


a i bol 
— 
S 
a 
& 


; ; Sx(z) — re) 7 | /< 
\ s ee AlL= 4, 
. lim P| sup. ( FG) < V 


where F(z) is a continuous distribution function, a > 0, A > 0. 

The statistics treated here are well suited to test if a sample comes from a 
population with the distribution function F(x). These test functions have the 
great advantage in that their distributions are independent of the distribution 
F(x) of the population. Massey [17], Birnbaum [2], and Malmquist [15] investi- 
gated the power of the statistics of Kolmogorov and Smirnov. The limit dis- 
tributions of these statistics have been tabulated by Smirnov [23], and the dis- 
tribution for finite N by Massey [18], Birnbaum and Tingey [3], [4]. Rényi 
tabulated his own limit distributions. Hence, today it is practicable to use these 
statistics. 

In this paper Theorems I through IV are extended for the case of discontinuous 
distribution functions F(x). The probabilities in question converge also in this 
case, but the limit distributions are no longer independent of F(x). They depend 
on the values of F(x) at the discontinuity points, but not on-the form of the 
function between the points of discontinuity. Theorems 1 and 2 are proved by a 
generalization of the method of Kolmogorov. They can also be proved with the 
help of stochastic processes, as Doob did it for the case of continuous F(x). We 
bypass representation of this method since it involves techniques similar to those 
of Anderson and Darling. The proofs of Theorems 3 and 4 follow in part the 
methods applied by Rényi, but also make use of the generalization by Kolmo- 
gorov of the Central Limit theorem. A part of these results has already been 
published [20]. 

I should like to thank W. Saxer for suggesting this topic. 


2. Extension of the limit theorems of Kolmogorov and Smirnov. Let F(x) be a 
distribution function continuous for x ¥ x, , where F(z, — 0) = fo_1, F(2,) = fo 
fory = 1,2,---,m, and fe,4; = 1. Denote the corresponding empirical distribu- 
tion function by Sy(z). 

THEOREM 1. Jf \ > O, then 


(1) . lim P[{ sup |Sy(x) — F(z)! < AN 1) = (a), 


Nox —B<zE<e 


(2) (A) = 2 (aah | ree I exp| -3 aX Aj; 2; x, | dx, --* dim, 
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where 


(fs41 a LIS; 74 f 1) om oe fj a $i’ 


0, for i<j-1 or t>j+l1, 


2n+1 


= (2x) I] (f, — J," 


{—d < Z20~1 _ 2r(p, — kfoy—1) < A, 


=—s 


—r < ta + Wp, + kfe») 
THEOREM 2. If \ > 0, then 
3 lim P{[ sup (Sy(z) — F(z)) < AN'*] = ® (A), 


Nox -“<r<e 


(4) lim P{ sup (F(x) — Sy(z)) < AN’] = ®(A), 


Nox —ewcir< 


1 2n 
(5) 7) = D (-1)*e me os f: exp| — > Aaya, | dn es 


k=) t,j=1 


where 
;—-o< (—1)?" (22,1 + 2k . fers) os 2Ap, — A, 


—2 < (—1)"’(x, + 2vkfe) + 2\p, < A, 


For A < 0 all limits are 0. The convergence is uniform in J in all cases. 

If the number of jumps of F(z) is countably infinite, a further limit process has 
to be made in which at first only the highest jumps of F(z) are taken into ac- 
count. The two limit processes can be interchanged, because #(A) and ®*(A) are 
continuous functions of the values of F(x) at the points of discontinuity. Hence 
further difficulties do not arise in this, the most general case. We will prove 
Theorem 1 for the case of a distribution function for which the inequalities 


Fors > fo > y= 0, . stacey 


are valid. The results must then hold for any distribution function with n jumps, 
because both sides of (1) depend continuously on the f’s. 

If the random variable X has the distribution function F(z), then Y = F(X) is 
also a random variable, the distribution of which has to fulfill 


PIF(X) S0}=0, =PIF(X)21)=0, — Plfe1s S F(X) < fu] = 0 
and, for fo © y S fos, 


P|F(X) S y|) = PIX s F“(y)] = F(F“(y)) = 
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Furthermore, since 
P([F(X) = fo) = P(X = 2) = fo — fn-i,; 


Y will have the distribution function 


(0, for y < 0, 
: 0 y, for fo Sy S fou, vy =O0,1 n 
(6) F (y) = 4, . ¢ 
Jor ; for fov—1 sy< fos v 1, 2 n 
- for y = 1. 


Let Sx(y) be the empirical distribution function corresponding to F’'(y). Then 
we have, for fo. S F(x) S forai, 


1 


St(F(z)) = = (Number of F(X,), F(X.) S F(@)) 


= * (Number of X;, X; S xz) = Sy(z) 


and F’(F(x)) = F(x). Hence 


“a 7 0 ~ ‘ 
sup |Sy(z) — F(x)! = sup |Sy(x) — F(z), 
—m<zt< —S< zr 
because the other values of F(z) cannot be attained. If we denote by J the union 
of the closed intervals [fo, , fori:], » = 0,1, --- ,n, we obtain 


(7) Pl sup |Syx(z) — F(z)| < AN’) = Plsup| Sx(xz) — x| < AN‘). 


O<zr< zel 
Denote by My the set of integers j such that 7/NelJ, 
(8) My = { ko = 0, a ky > ko > ko + iL. “7? . ks : 


r 


The k; are defined such that k;/N — f;, as 
behavior of 


(« 2 ax | Sx i _ J ww] 
ad P| max | (x) - | < 


when VN — ~. 
The event &% , k ¢ My , happens if simultaneously all inequalities 


— x. We wish to analyze the 


Sy (z) ~ z < AN”, for jk, je My, 


st (‘) ~2« 
~ AN N 


are fulfilled. Py, is the probability of &4 . Pow is equal to the probability in (9). 


and the equality 


Z|. 
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We can calculate the P recursively by means of the initial conditions Po = 1, 
Pi = 0 fori # 0, and the equations 


P; +1 = Z Px Pi{é, k+1 Ey | 


for k é My ‘ k+ lel 
Ps te, 


for » = 1, 


The occurring conditional probabilities give 


Po f(K+t1\)) wfk\_t-—jt1| wo (fk 
“Cijsieey) GP) 
NG HG HtIAN —-—& wk 


fork eMy,k+1eMy, and 
s Kay = Ss Kay a i= j + ke — ko Ss a») = Koa + j 
N N N : N N 
s ( N aa i A. j ) ke, oo = oe 1 N- 3 
t— Jt ke — koiJ\N — kor N — ko ; 


for vy = 1, --- , n, according to the laws of the binomial distribution. 
The recursion formulae can be simplified if we introduce the new terms 
rNfar “\ 
_ N(N —k —2)! 
NWN — k)*-**teé 





(14) Qi 


Now we have 


Pu . 


Qu = 1; Qo = 0, fori ¥ 0; Qu = 
1 , 
Qk - ~ Qi ; 


eiciait —@ 
ji <ani (tc —j+ 1)! 


fork e Myandk+1eMy: |i) < AN’, 


ici (ke, — hea)’ k+ke,—ke,—1 
(1: Qi. = Bi (ke sil oo 
») 2 2 me Q (i as j + ke, nie ke,—1) ! e*?” 
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for vy = 1, --- , nm, and for the probability (9) we obtain 
si Nte* 
(16) Pon = a Qon . 


For finite VN, Poy is evaluable, but as N — x the number of necessary recursion 
steps tends to infinity. 


Let Y;,j ¢ My, be independent random variables with the distributions 


- wa . © RE. Fi _ ; i 
(17) P| y= on |= ae i = 0,1,2,-+-3j = ke, 
(18) ply, — Kon + ke, a (a, — Keys) iu £? 

: AN} Ll ekze—kav—1 
Then 
E(Y,;) = 0, 
sane l 1/472 ko — ke, 1 
E(Y;) = t ke, : E(Y;:..) = ; ‘ 
) ev? J ~ ke ae A 
can Aa ae 2 | ; aia 8 (ke — key)! 
E(\ }y os i , ke: E(\ Yx.,\) ~ A, a3 
= (142) oy Jah ma N 


The event Dy , k ¢ My , take place if the inequalities 
ym ii < 3 
iz; 


for all j S k and the equality 
; d 
Y, = — 
aX i XN} 


are simultaneously fulfilled. The probability of Dy is Ru, Ro = 1, Rio = O for 
1~ 0. 

We can easily verify that the recursion formulae for the Ry, are the same as for 
the Q,, . Therefore, 


(19) Ris = O, 


«i 


for all 2 and k. For the probability in (9) we obtain 


(20) Pon = N ‘ R i. 


Ro = 1, Rio = 0, for? # 0, 


(2] Binss er = be, Ries, P[Dix., +1 Djxo,], 2 0, ee 
\=1) i 
Ruz. = » Riu y (hay 2 ka ) - : ii ‘ y= 1, oo @ 2 ik 


cANS "(i — f+ ko — keoy_,)! ebteF2e-1 
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These conditional probabilities can be written in the form 


° l . 
Fide... | Da.) = P| —1 = = iw s,. 
(22) , aie 
(hea eo ea 
v2p , 9 2r4 7 r XN} , 


and their limits for N — « can be obtained by the following lemma of Kolmo- 
gorov. 


Lemma, [12]. Let Yan,--:, Yarmy be, for each M, independent random vari- 
ables, whose values are multiples of « = «€(M), with 
E(Y y;) = 0, E( Yj) = 2bu;, E(\Yu;\") = du;. 


Let a and b be two numbers such that a < 0 and b > 0. Assume the existence of 


positive numbers A, --- , E, such that, for all M, the inequalities (i) through (iv) 
are fulfilled: 


(i) A < 7 bu; x B, 


i=l 


(ii) oat < Ce, for all j, 


I Mi 
(iii) P[Yu; = lujel > D and P[Yu; = (lu; + Wel > D for all j and suit- 
ably chosen I y;, 
(iv) a+ EE <iye << b—E. 
Then 


j m 
P | X 2 Yur < b,j = 1,2, +++, my; > Yur = ine | 


kewl 


mM 
= € (» (0, 0, iwe, 2 2 bus) + a) ; 
kal 


where u(c, 7, 8, t) is Green’s function for the heat equation 


in the region G, 
G = {a<s<b,t> 0}. 
If «M) — 0, then A — 0. 
This lemma can be applied to the random variables Yx,,41, Ye,,42,-°° , 
Yx.,,,, - It should be noticed that the variables Y,,, do not fulfill condition (ii) 
and hence must be treated independently. Our recursion formulae are now 


Ro=1, Ro = 9, i <0, 
ins 1 4 j~ J kei — ke ) 
(23) Rates = \i(cawd Rites ON (« (0, 0: NT? oN ) ” ape 
y =O, ---,n, 
. — Ke i—j+key—koy-1 
Rits, = xu Rix (key = hea)" , ve=l, 


ij <a E(t — J + he — Kays) Lek2e—*ee-" 
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where uj(o, 7; 8, t) is Green’s function for the heat equation in the region G; , 


, j j 
oe 8 =~ -1..,2> 6), 
G; 1 NI < Ss <x 1 XN! 0 
or 
1 <= 
u;(o, 7; 8, t) = >, (—1) 


2V/ x(t — 7) = 


(24) [- : J ( I 1 ( J ) 21), 
. ss (s + ANE —1) a+ NI > ia 


exp > 


If N tends to infinity, the A’s disappear and the sums over j go over into inte- 
grals, with the exception of the sum in the first step which consists of only one 


summand, 
1 j hy ) 
R,. = 0, 0: — ; A}. 
as i (e( rf Jag) + 


With this exception all sums tend to finite positive limits. The factor in (20), 
multiplied by N* also tends to a finite limit, namely 


4 Nie” — 
oi VN ~ V2. 


For r-N => x, we obtain 
N*(Kay — Keoy1)"**e—" a I 1 a | 
(r + Koy — keys) !ek2x—k2y-1 V2(fo — fov—1) “I 2 for — fort 


Finally we have 


lim P | max Si () _ + <AN ; 


0 Da 1 
= DY (=F (Qe) TT Of; -— fa) 
: — j=l 
(25) - o n - 
? a ] (Xo, — Ly 1) 
| | exp | 3 2 o> Te 
A<z A 


] e (ro41 — (—1)" x, — ait | 
ee st - — dx, eee AX > 
2 va 0 Sova = i 
where xo and 22,41 should be replaced by 0. This expression is (A). 
. ° e 7 . >} 
Let us now prove that for those values of \ and sequences of N for which XN’ 
are integers, 


(26) lim P{sup | Sy(z) — «| < AN] = lim P| mas Sy () —J/<xN 


N~x zel N20 1eMN N 


must be true. 
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To each x ¢ I there exists aj ¢ My such that either x = j7/N orx = j/N + € 
with 0 < « < 1/N. Set SX(x) = i/N. From 
4 i—j _ rN 
Sx(z) — 2 = NW J_e¢2 a 
follows, for « > 0, 


Sy (4+) ~1 ts Ape 


i+ 


J 


_ 


f= 4 = AN! 
N — 


because the value to the right is a multiple of 1/N. From 


| 


N y 


; rio te rN! 
S\(2) =a fF & 4 = = < — N 


follows analogously 


. . ee rN! 
Ss J ~j<« Si (x as -J< at nies - 
(2) w © Sx@) — 5 N ™ N 

The second probability in (26) cannot be smaller than the first one and the 
limit of the second probability depends continuously on the endpoints of the 
intervals of J. Therefore the two limits have to be equal. The convergence must 
be uniform in X, since (A) is a bounded and continuous function. Hence 

P [sup | Sx(z) — 2| < AN ] 
zel 
tends to (A) for all A and all sequences of N. In view of (7), this proves 
Theorem 1. 

Theorem 2 can be proved in a similar manner. We now disregard the absolute 
value signs : the definition of bu and Dy . The summations in (10), (11), (15), 
(21) and (23) go from — « to \N! and the lower boundaries for the partial sums 
in (22) are omitted. Green’s function for the heat equation in the region G; 


» 9 


f 


: j . 
G; ah ie 1 “yet > o 


is now 


1 


u;(o,7;3, ==> = >> (—1)' 


QV rit rs tT) foo 





Hence 


~ [max ( , (; un) < 


= (2) ” I 
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fof ew [ 435 & = 


foo =a aa 





Ls (41 — (—1)" Zn — 2dje)” 
ee 2 Tar) )rdre = 2NJr) dx, +++ dion, 


2 v=x() fovat — Jo» 
where again zo and 22,4; are 0. This proves Theorem 2. 


3. Extension of the limit theorems of Rényi. Let F(x) be a continuous function 
for x ¥ x,, with F(z, — 0) = fo, and F(x,) = fo, for vy = 1, 2, --- , n, and 
fonaa = 1. Let fo be a positive number such that fo S fi. If fo > fi , then we get 
the same results except that only the f; = fy will appear. Denote the empirical 
distribution function by Sy(2x). 

THEOREM 3. If \ > O, then 


(27) lim P| sup Sx(z) — F(a) < AN } = W(A), 


N20 fo< F(x) F(x) 


+2 


(28) wr) = Dd (-1)' af | exp | —} bo Aya | dxy +++ din, 
h 0 Hy = t,j)=0 


where 


(fiar — fi-df; —fifjr 
Bigg et eee A, @ hs tee, 
= (Fj41 ~e AT ™ fi-) — -_ (J; vv fi-1) 
Ai; = 9, forni<j-1l or i>jt+il, 
d = (2n)"? TT (fin — fad), 
and 
H,, = U {—-r < (—1)*xy + 2k < A; —A < (—1)""ay_1 + 2p, < A, 


—r < (—1)"’2y + 2Ap, < AVv = 1,°-°-, Nn}. 
THEOREM 4. Jf X > O, then 


Sy(x) — F(x ae ' 
(29) lim P| sup v(t) Pa) <AN * 1] = V'(A), 
N22 fo< F(z) F (x) 
(o . F(x) = Swy(x) y—i + 
(30) lim P| sup - <4 * | = BA), 
N>2 fos F(z I (x) 
1 e 
(31) 9 Wa) = Y (- 1)! a| ies [ er] - SD Av asas| 
0 4 j dzy*+*dre,, 


whe re os 


1 


Hi= U f=» < (—1)*x) + 2vk < »X; 


Py Pr ) 


—x < (—1)"’ru-1 + 2p, < A, — 2% < (—1)”" xe, + 2p, <A,v = 1,--+ yn}. 
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The convergence is in both theorems uniform in A and for \ S 0 all limits are 0. 

These theorems can also be extended for distribution functions with infinitely 
many points of discontinuity. 

We introduce again the random variable Y = F(X) with the distribution fune- 
tion F*(x) and the set J as the union of the intervals [fe , fosi], » = 0,1, --+ , 7. 
For any F(x) ¢ I we have 


Sx (az) — F(z) - Sx (F(x) _ F’(F(2)) 


F(x) F°(F(2)) 
and therefore 


Sy(a) — F(x) | Sy(x) — a! 
: = sap |———|- 





oiaaties F(z) | zel 
Let Ry(x) be the empirical distribution function of a sample Z;, Z2,--- , Zw 
from a population with the distribution 
(32) PiZ Ss zi = z, 0s 256 1, 
then 
(33) P| sup Sy = aman ww] = P cup a —*| < AN" ‘|. 
ze ze 


since the distributions of the two populations coincide for x ¢ J. Thus, 


Sx(a) — F(x) . v(a —_ = 
(34) P| sup — = < AN = P| sup Rus) — 2 < AN . 


Pish efe F(x) 


zel r 
The set J, is defined as the union of the intervals [fo — €, foair + el, v = 
0O,1,---,n,fore > 0. If |Rx(x) — z| S «, then 
Ry(x) — 2 Ry(x) — 2x 
sup —— <= sup - ; 
Ry(ar)el at zel, oa 


since Ry(x2) ¢ J implies that xz ¢ J,. We see that 


p| sup Rx(z) —*| < aN 


(35) rel, l 


Fa as on 
< P|| Rx(z) — 2) >) + p| sup Rs ) 4 AN ‘|. 
xr 


Rylazjel 


By a similar procedure we have 


(36) Rw talel se : 


<= Pl|Ry(z) —x| >d + P| sup ae — 7! < aN ‘|. 


rel, ll 


It is sufficient to prove 


(37) lim P| sup Kult) — 2 < AN } = W()) 
wr 


Nx Rwyi(zjel 
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since the probability 
P[ |Rv(x) — 2| > 


tends to 0 as N — «. The function (A) is continuously dependent on the 
boundaries of 7. Therefore, from (35), (36) and (37) we get 


, ‘ IR (a) —_ x| — 
(38) lim P | sup ——_——|< AN ’| = ¥()) 
N+x zel | aX i 
and from (34) follows the statement of Theorem 3. 
We arrange the numbers Z;, --- , Zy of the sample according to their values 
and denote by Z7 the one for which there are exactly k — 1 smaller numbers in 


the sample. The probability of ties is 0 since (33) is a continuous distribution 
function. Ry(x) is equal to k/N in Zi < x S Zh. In this interval 


sup | Ry(x) — | = max | kiN — |] | | k/N — 1 \ 
siete | v | _ \| Ze MEAP If’ 

For Zi31 = k/N 

k IN | - ] T 

KN | g|@+DN ly 1 

| Sk41 4k+1 Jo N 
since k/N = fo implies that 5. > fo. For Ziti < k/N 

k N | A ] y 
WIN |g |Q4D/N _ | 
| Zita | Zi 4; 
Therefore, 
| k/N | Ry(x) —2 1 
max | — -—-lis su feeaees < max — 1 : 
k/Nely ny | Zt _ RN caen N x k aah N Li + fo N 
and (37) is equivalent to 
(39) lim P| max k N —1\/<,N? l=). 
N20 kiNet;in | Se 
We can write this equation as 
: k/N = 
(40) lim P| max In(-3, <AN* | = V(A) 
N-+o k/NeI, in Z} 


or, since log n — >.,21 1/r > c, 


n 


‘ l 1 
41) lim P ax |In- —- AN? | = W(d). 
(41) lim | max. 0 Ge ~ 24] < | (X) 


The random variables In (1/Z?) are not independent since they fulfill the in- 


In ) = in ( = K++ < in (3) 
Z* Zx* Z¥] 


equalities 
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However, they do form an additive Markov chain (ef [24]), i.e., their differences 


In (5 -) —In (+) 
Z,*; Zt 


are mutually independent. The variables 


U,=(N+1-J) = —In 1), I=1,---,N, 


aN+1—1 242-1 
have the distribution 
P{(UU,; S 7} = 1-e”, 0s 2 < @. 
On the other hand we obtain 
l N+1-k U, 
més ) = oo 
(73 » N+1-1 


and 


In (7s) = > - = = 
Zt elo fa N+1I—-U 
Some moments of the variables 
; 4 U,-1 
V, = N'— 
N+1-1 


are 


: 2 N oe 12 N 
u(V,) = 0, Vy) = ——— , EUV!) = ( — 2) — ; 
EW) EW %) (VN + 1 —-— 1)? ee ( (N+ 1 — 1) 
Let the set of integers j, for which (V + 1 — j)/N € Iyn, be 
J O, 1, °** Ji 53a ga By. *** 5505 *** Soe odes Hf, °° * 5 ee > Fh. 


The j, are defined such that j;/N — 1 — fensi_ias N — «, for? = 0,1, ---, 
2n + 1. 


According to well-known rules for conditional distributions, we have 
l N 1 : N+1—k 
P| max In ( :) _ a: < AN ; = P| max > Vil < | 
k/NeT\ iN Zi tak L | k/NeIyin | l= 
= [. | I] Gos, 


il<a 


(42) int: 


} 
max > Vil <A, 
etl +sJana) i=1 
nm 


P| 
l=j2 
J2¥+1 

ia Te. 
i=l t SJ2» 
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where x_; = 0. The limits of the probabilities which occur in this integral can be 
calculated by a limit theorem for partial sums of random variables. 

LEMMA. (See {13}, {1 8) Let Y an go te FY itu uw be m vu inde pe ndent random 
variables with 


mM 


EYu) =0, >> E(VYin) = 2tw. 
1 


Assume that for all k 
kK ( Y wa 5) 


(43) HY) 


< u(M) 


» 


where u(M) — 0 as M — «, and let a, b, &, and n be any numbers such thata < 0, 
b>O0,andasi<nsb. Then 


IIA 


k 


mM 
(44) lim P E < 2. Yu; <b,k = 1,--+, myc t < rz Yui < n| = u(0, 0), 
M+ 1 


1 
where u(s, t) is the solution of the differential equation 


ss au anu 
(45) — 
ot Os 
for which the boundary conditions 
ule, 7) = 0, ce <He =< €, n<s<b, 


, u(s, T) = 1, E <2< 8, 
(46) 
u(a,t) = 0, eo -<¢ =F, 


u(b,t) =0, O<t<T, 
are fulfilled. 
We caa apply this lemma to the variables Yj,,41 , Yj,,42, -°°, Yi.,.,, for 
vy = 0,1, ---,n, because these variables satisfy (43), with 


7 3 
(N) = ne 
u(N) iN 
The sum of the second moments 
2741) eS! N : . | ; . 1 
Qty"? = =N =—N - 
" 2a, (N+ 1—k) velit de k 


tends towards 


aa op (204 1 — fon—o 1 — fon—ov4 fon—2v41 — Son—o 

(47) are = “—- aa eo, 
fon—2» fon—2¥41 fon- 2v+l1 fon- by 

and the boundaries for the partial sums are now 

(48) a= —X — Id] » b = r - Rs E = —iX — Ty be 
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where x_; = 0. The solution of (45), which satisfies the boundary conditions (46), 
(48), i 


220—Z20-1 


1 +20 
u(s,t) = | > (—1)? 
QV 9(T — t) ¢-r\-2_,-1 j= 


[ (e + 2-3 = (—1)e + 2s) — a | 
e exp — — dx. 
4(T?+) — 2) 


(49) 


Hence we have 


l J24 
lim = max z rel < &, > V; S & * V;= Pars | 


N +00 lejo,t+l, ° 32741 t=] i=] 
(x — (—1)’ ry-1 — 2dj)* 
— . = (—1)’ exp | - meat ) ae 


On the other hand, we apply the Central Limit theorem to the variables V 
V js,-1425 °°* » Vyq, , Obtaining 


lim p| > ¥.6%4 2 Vi ts | 


Nw i=1 i=] 
1 a 1~Z2»-2 afl ar(? j 
= = e dx 
20 2T® Le 


where the 7°” are defined in the same was as the T’*” jn (47). 
In view of (42), (50) and (51) it follows that 


| N+1-k | 1 
lim P| max D» Vil< |= ~ a poy 


aicacad k/NeIy iN Tost ont TY (T° 


(50) 


Jav-1+t1> 


lA 


(51) 











j=l 
(52) ~ ~ ta — —] 7 2-1 2d ¥ , 
, ° = [. ° | exp ->. (x2 a ( ) o = - Py) 
ee rad 4 Tey 
zyl<~A 
art1 — 2») 
= > APann diy +++ dXon, 
van() 7 

where x, = 0. This expression is ¥(A). This proves (37) and consequently 


> 


Theorem 3. 
Theorem 4 can be proved in the same way. In the lemma of Kolmogorov we 
replace a and — by — «. The solution of the boundary problem is now 


T2»—Z2y-1 1 
shi = --atiaiaia > (-1) 
2V a(Te*) — t) Le i=0 


io £ tes — et oes Da PAN") a 


(53) 
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and we obtain 


l J2r41 
lim P| max (> i) <>, ( = v.) S «ty ( 7 v.) = rs | 
N+x l=joytl,-**sJ2n41 i=l i=l tSJep 


eZ2y 1 (x _ _ " ~~ 9 *\2 
+ tes [7 t- 0 p[ -2= P= 9") 
2 aTOr) Le io — 


From that Theorem 4 follows. 


lA 


(54) 
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INCOMPLETE SUFFICIENT STATISTICS AND SIMILAR TESTS' 


By Rospert A. WiJsMAN? 
University of California, Berkeley 


0. Summary. For a family of exponential densities a method is given, called 
“D method,” for constructing a class of similar tests in the case that the mini- 
mal sufficient statistic is boundedly incomplete. This method also provides a 
proof of a criterion for bounded incompleteness. Under certain conditions the 
criterion states that a sufficient statistic for a family of exponential densities is 
boundedly incomplete if the number of components of the statistic is larger than 
the number of parameters specifying the distribution. Applications are indicated 
in the Behrens-Fisher problem, and in the problem of testing the ratio of mean to 
standard deviation in a normal population. In the latter problem it is shown 
that the D method generates the whole class of similar tests. Some unsolved 
problems concerning the existence of an optimal similar test are indicated. 


1. Introduction. Lehmann and Scheffé [8], [9] have introduced the concept 
of completeness of a family of measures and have shown the usefulness of this 
notion both for unbiased estimation and for the construction of similar regions. 
The latter were introduced by Neyman and Pearson [11] as a means to cope with 
tests of composite hypotheses. If the hypothesis is composite only because of 
nuisance parameters, then the requirement of similarity of the test is often a con- 
venient means of restricting the class of tests to be considered. If the hypothesis 
is composite both of nuisance parameters and because the parameter tested is 
not completely specified by the hypothesis, then similarity is often required if 
the test is to be unbiased. For instance, let 6 be a real parameter, 7 a possibly 
vector valued nuisance parameter, and let the hypothesis be H:@ S 6, the al- 
ternative 1:6 > 6, for some specified 6). Suppose we want the test to be un- 
biased, then the power function of the test has to be Sa for 6 S 6) and 2a for 
6 > 4, where a is the level of significance. If, in addition, the power function 
is continuous, which is usually the case, then we have automatically that its 
value on the surface 6 = 6) equals a, identically in 7. Search for an optimum 
unbiased test reduces then to the simpler problem of search for an optimum 
similar test of the hypothesis H,;:4 = 4 against 1:0 > @. 

In the presence of a sufficient statistic there exists a special class of easily 
constructible similar regions [10], termed similar regions of Neyman structure by 
Lehmann and Scheffé [8]. They proved that every similar region is of Neyman 
structure if and only if the family of distributions of the sufficient statistic, as 
specified by the hypothesis, is boundedly complete [8]. Unfortunately, there 
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are important problems in which the latter condition is not fulfilled, in which 
case the class of all similar regions is larger than the class of similar regions of 
Neyman structure. An example is the Behrens-Fisher problem (see, for example, 
[13], in which also references to earlier work can be found). In this problem the 
similar regions of Neyman structure are of no use, since for any such region the 
power function is identically constant. 

All remarks in the previous paragraph are equally valid if instead of similar 
rejection regions we consider randomized similar tests. It is clear from the dis- 
cussion that in each problem of testing a composite hypothesis by means of a 
similar test it is important to know whether or not the problem admits a bound- 
edly complete sufficient statistic. If not, one would like to have a method of 
constructing all similar tests. It is the purpose of this paper to provide partial 
answers to these problems. In section 3 a method termed the “D method,” will 
be given for the construction of a large class of similar tests in the case of a 
family of exponential densities. In section 5 the D method will be used to de- 
rive a criterion for bounded incompleteness in the case of a family of exponential 
densities. Two examples of the D method are given in section 4; the first ex- 
ample is the Behrens-Fisher problem, the second example is the problem of 
testing the ratio of mean to standard deviation in a normal population. For 
the latter problem it is proved in section 6 that every similar test can be con- 
structed by the D method, provided this method is given sufficiently wide 
scope. Some remarks on the problem of finding an optimal similar test are made 
in section 7. A preliminary account of the results of sections 3 and 5 appeared 
in [16]. 


2. Similar tests and boundedly incomplete sufficient statistics. Let & be a 
space of points x, @ a o-field of subsets of X (with Xe @), and ® = {Ps, deQ} 
a family of probability measures on (X, @). Expectation with respect to P¢ will 
be denoted by Fy. If wa C Q and T is a sufficient statistic for ®. = {Pe, @ew}, 
we shall also say that 7' is a sufficient statistic for w. The range of T is denoted 
by 5, and is understood to be a Borel subset of a Euclidean space. Let @ be the 
o-field of Borel subsets of 35. We recall the following definitions: A sufficient 
statistic for w is called minimal if the sufficient sub o-field which it induces in 
X is “essentially”? contained in every sufficient sub o-field for w (see Bahadur 
[2] for a precise definition). A sufficient statistic 7 for w is called complete for 
w if, for every @®-measurable numerical function g 


(1) Eg(T) = 0 forall @ew>g=O0 ae. (@.,). 


If the implication (1) holds for every bounded @-measurable numerical function, 


then 7 is called boundedly complete for w. The following implications are true 
[8]. 


(2) Completeness = bounded completeness = minimality. 
3 The term minimal was introduced by Lehmann and Scheffé [8], whereas Bahadur [2] 
describes the same concept with the term necessary and sufficient statistic. 
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Suppose a composite hypothesis H specifies @¢&w C 2. We shall consider 
randomized tests for H with test functions ¢, where, for each x ¢ X,0 S ¢(x) S 
1, @ measurable, and H is rejected with probability ¢(x) if x is observed. Among 
all tests we restrict ourselves to similar tests, defined by the condition that 
Ew is independent of @ if @¢w. If T is a sufficient statistic for w, @ C @ its 
sufficient sub o-field and @ any test, we can consider the @o-measurable func- 
tion E(¢ | @o). If a is a number, 0 < a < 1, and if ¢ is such that E(@ | @) = a, 
then clearly Ea = a for all 6 € w, so that ¢ is similar. Such a ¢ is called a test 
of Neyman structure [8]. If T is a boundedly complete sufficient statistic for w, 
then every similar test has Neyman structure [8]. On the other hand, if a suffi- 
cient statistic T is not boundedly complete for w, then there exist similar tests 
which do not have Neyman structure. This follows from the fact that 
the bounded incompleteness implies the existence of a ®-measurable numerical 
function g on 3, bounded below by —a, above by 1 — a, different from 0 on a 
set of positive probability (with respect to ®.), with Heg(T) = 0 for all 6 & w. 
With f on & defined by f(x) = g(T(x)), we have that @¢ = f + a is similar of 
size a, but E(¢| @o) — a = f ¥ 0 ona set of positive probability, so ¢ is not 
a test of Neyman structure. Conversely, for any similar test ¢ we can form the 
function f = E(@| @) — a and define g on 3 by g(T(x)) = f(x), so that 


Ew(T) = 0 


for all 6 ¢w. It follows that all similar tests can be found by constructing all 
bounded numerical functions g on 3 whose expectations vanish for all @ & w. 


3. The D method for constructing similar tests in the case of a family of reg- 
ular exponential densities. In this section the restriction of @ to w will be under- 
stood. Let the distribution of 7’, induced by Ps , have a density with respect to 
m-dimensional Lebesgue measure, and let this density ps be of the form 


(3) pe(t) = c(@) exp | - - 50) | h(t) 

t=] 
in which ¢ = (t,, +--+ , tm), and s,, +--+ , Sm are real valued functions on w. We 
shall assume that the function A is of such a nature that it is possible to find a 
closed m-dimensional cube C on which h is bounded away from 0. With this 
restriction on h, the family (3) will be called a family of regular exponential 
densities. Exponential densities which arise in statistics are always regular. 

If w is an m-dimensional subset of an m-dimensional Euclidean space, then, 
under mild conditions, 7 with density (3) is complete for w [9]. In that case 
every similar test has Neyman structure. From the point of view of the present 
paper the interesting case arises when w is a subset of an m — 1 dimensional 
Euclidean space. In that case @ has at most m — 1 components, so that the m 
functions s; are functions of at most m — 1 parameters. Eliminating those 
parameters will result in a functional relation between the s,. Suppose that 
this relation can be put in the form 


(4) P(8,,--*,8m) = 0 





INCOMPLETE SUFFICIENT STATISTICS 1031 


in which P is a polynomial of positive degree in at least one of the s, . It should 
be kept in mind that (4) holds identically in @. 

As discussed in section 2, a similar test of non-Neyman structure can be 
constructed by constructing a bounded function g on 3, g # O on a set of posi- 
tive probability, such that 


(5) / g(t)pe(t) dt = 0 


Using (3), remembering that h is bounded away from 0 on some m-dimensional 
cube C, it suffices to construct a bounded function F which is * 0 on a subset 
of C of positive Lebesgue measure, vanishes outside C, and satisfies 


(6) / F(t) exp | - > 50) | dt=0 

i 6 
The function g in (5) can then be taken as F'/h. The left hand side of (6) is the 
m-dimensional Laplace transform of Ff, denoted by £(F): 


(7) / F(t +++ , tn) exp | - > s, | dt = L(F)(s --- , 8»). 

The problem is to construct F in such a way that £(F) = 0 for all values of 
s(@), 6¢&w. This can be done with help of (4). Let P be of degree d and let G 
be a function on 3 possessing all partial derivatives of dth order in the interior 
of C, vanishing outside C, and having all partial derivatives of d — 1st order 
continuous on the boundary of C. An example of such a function is the follow- 


ing. Let C be given bya; S t; S a, + 1 (i = 1, --- , m), then on C we can take 

Git) = IT (t; — a,)*(a; + l — t;)*. Now denote by D the differential operator 
7 0 0 

We then have 

(9) L(DG)(s) = P(s)L£2(G)(s) 

in which s = (s,, +++ , 8»). Since the right hand side of (9) is = 0 by (4), we 


may take F in (7) to be F = DG. The final result is therefore 
(10) g(t) = (DG(b)/h(t) 

for suitably chosen G, and 

(11) o(t) = a + (DG(t))/h(H) 


is a size @ similar test of non-Neyman structure. 

Even for one m-dimensional cube C the number of choices for G is large. In 
addition there will usually be a large number of m-dimensional cubes on each 
of which h is bounded away from 0, and finally one may consider regions other 
than cubes for which the construction of functions G is possible. Thus, there 
will be a large class of functions g satisfying (5) which can be generated by the 
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differential operator method, called the D method henceforth. Whether this 
method, in general, will give all those functions g, is still an open question. In 
one particular case the question has been answered in the affirmative, provided 
the definition of D method is taken sufficiently wide (see section 6). 

Suppose that with the help of the D method a similar test ¢(7') is constructed, 
and that it is desired to consider similar tests which do not necessarily depend 
on T only. Let y be a test function defined on the sample space &. If ¥ is chosen 
to satisfy H(y | t) = ¢(t), then y is also similar. In particular, it will usually 
be possible to construct in this way a similar rejection region w, in which case 
y is the indicator of w (this construction fails is X is a subspace of a Euclidean 
space with same dimension as 3) A similar region w is constructed by demand- 
ing 


(12) P(w|t) = ¢(t). 


In other words, on each surface T = ¢ in the sample space a region is selected 
which has conditional probability ¢(t). This generalizes the construction of a 
similar region of Neyman structure [10]. Equation (12) will be used in section 
4, example 2. 


4. Examples of the ) method. Examp.e 1 (Behrens-Fisher problem). Let 
X,,-°::, Xn, be m independent observations on a normal variable with mean 
My, Variance 01, and Y,,-+--, Yn,, m2 independent observations on a normal 
variable with mean pe, variance o3 . The X’s and Y’s are independent, and all 
parameters are unknown. Under the hypothesis tested, which is u,; = wo, the 
joint distribution of the X’s and Y’s has an exponential density with exponential 
factor 


n 


1 ni J i l 2 + 
exp |- a et Remon t ah "| 


in which yu is the common value of yw; and we. We may take 


9 
9 


ny ny ne ne 
Ti(x) = X Li, T(x) = z. Li, T;(x) = z. Ys, T(x) = 8 5» 
1 i I 
8:(0) = 572? s(0) = — 83(0) = 2 3+ 8,(0) = oa 
“0; 07; ad: 0» 


The s, are linearly independent, from which it can be shown that 


7 => (T; ; T. ’ Ts 9 T.) 


is a minimal sufficient statistic for w. 7 has a regular exponential density of 
form (3), with 


9 


(13) h(t) = (mt — 8)" ? (nats — GY” 


: 2 2 . se ° , 
if mit; = to, Nets = ty, and A(t) = O otherwise. By eliminating yp, o1 , o2 from 
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the four s; we obtain s,s; — ss; = 0 as a realization of (4). The differential 
operator D in (8) is then 


(14) D 00 00a 


~ Oty ty Its 


and for suitably chosen G the test 


(15) ¢(t) =ath'(t) (2 Sad ‘) G(t) 

Ot; Oty Ot Its 
is similar and of size a, where h(t) is given by (13). Whether this method can 
be used to show the existence of an invariant similar region, such as the one 
proposed by Welch [1], [15], has not yet been investigated. 

It should perhaps be mentioned here that the approach to the Behrens-Fisher 
problem by Wald [14] is essentially different, since Wald does not require the 
test to be similar. 

EXAMPLE 2. (Standardized mean of a normal population). Suppose we make 
n + 1 independent observations on a normal variable and consider hypotheses 
concerning the ratio of mean to standard deviation. By an orthogonal trans- 
formation this problem can be brought in the following form: Let Xo, --- , Xn 
be independent and normal, with common, unknown variance o°. Xo has un- 
known mean yp, X,,--:, X, have mean 0. Denote p/o = r, then for some 
given ro the hypothesis tested is r = ro. For the time being the alternative to 
be considered is immaterial. For later reference, however, suppose that the 
alternative is r > ro. We then have 

Q = {(r, o)ir=rm,a> 9}, 


wo = {(r, o)tr = 1,6 > 0}. 


Under the hypothesis the joint distribution of the X,; has the form given by 
(3), with exponential factor 


Ils 2 r 
exp | - 92 p ait = r| 
ao 9 o 
so that we may take 


T(z) = & xi ’ T(x) = x, si(o) = = 8.(¢) = --. 


T = (T,, T2) is minimal sufficient, since s; and s2 are linearly independent. 
soe ‘ e . . 2 2 
Elimination of o from s; and 8» gives s. — 2ros; = 0, so that we can take 
> 2 °..2 
(16) P(8, , 82) = 82 — 2rosi 
and 
= a 2 0 
(17) D = —; — 2rn — 
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The function h in (3) is found to be 
(18) h(t, 9 ts) = (t; = ey a 
if , => t, and h = O otherwise. For suitably chosen G(t; , f2) the test function 


(19) o(:,4) =a + (4h — &)'"* Ga — 27 2) GM, , te) 
ols Ol, 
is similar and of size a. 
Equation (19) can be used to demonstrate the existence of similar tests which 
are not invariant. In the present problem an invariant test is a function of 
T2//T; only. Choose for G in (19) the following function: 


(20) G(t,, ) = c(h — 6)? Me™ 


if t; = t; and G = 0 otherwise, with c > 0 chosen so small that ¢ is bounded 
between 0 and 1. It is easily checked that after substitution into (19) the re- 
sulting test function is not invariant. This example can also be used to show 
the existence of similar rejection regions which are not equivalent to a cone in 
the sample space (we shall call two tests equivalent if they have the same power 
function, and by a cone is meant a union of rays through the origin). If w is 
any rejection region, ¢@ the corresponding test function, given by (12), then w 
and ¢ are equivalent since 7’ is sufficient, not only for w, but also for Q. If w, is 
a cone in the sample space, then the corresponding ¢; is invariant. Let we be 
any rejection region equivalent to w; , ¢@: the corresponding test function; then 
¢; and ¢2 are equivalent. Now 7’ is not only sufficient for Q, it is also complete 
for 2. Since ¢ and ¢2 have the same power functions, it follows then that ¢, 

ge a.e. and thus @» is also invariant. The existence of a noninvariant similar test 
¢ implies then the existence of a similar region which is not equivalent to any 
cone in the sample space.‘ 


5. A criterion for bounded incompleteness in the case of regular exponential 
densities. Let the family of distributions be given by (3), with @ € w. By (2), 
if T is not minimal sufficient for w, then 7 cannot be boundedly complete. This 
happens, for instance, if the s; are linearly dependent on w because the exponent 
— st; in (3) can then be written as a linear combination of fewer than m of 
the ¢;. The incompleteness in this case also follows from the applicability of 
the D method of section 3, because of the existence of a polynomial P, linear 
in this case, for which (4) holds. On the other hand, if the m functions s, are 
linearly independent on w, then 7’ is minimal sufficient for w. Even if this is the 
case, 7’ may still be boundedly incomplete. Theorem 2 below tells when this 
will happen. Its proof uses the D method of section 3. The conditions of The- 
orem 2 are designed to guarantee the existence of the polynomial P on the left 





4 This seems to contradict a statement by Patnaik [12] to the effect that in the problem 
under consideration every similar region is equivalent to a cone in the sample space. How 


ever, Patnaik’s proof is unconvincing, and the non-invariant ¢ exhibited above provides a 
counter example. 
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hand side of (4), such that P is not the zero polynomial. This is made possible 
by the following theorem, due to A. Seidenberg (private communication). The 
proof is given in Appendix 1. 

THEOREM 1, (Seidenberg). Let for each i, i = 1, +++, m, Pils; 3 01, °+- , %) 
be a polynomial in s; and the 0,(j = 1, +--+ , k), with coefficients in some field K, 
where k < mand P; is of positive degree in s;. Let A,(@) be the leading coefficient 
of P; as a polynomial in s;. Then there is a polynomial P(s; , +++ , 8m) with coeffi- 
cients in K, which is not the zero polynomial, and a power product B(@) of the 
A,(0), such that B(@)P(s) = 0 whenever P; = 0 for all i. 

Corouuary. If 6 is restricted to a set ©, and if, for each 0 ¢ © and each i, 


A i(0) ~ 0, 


then P = 0 whenever P, = 0 for all 2. 

For, if A,(@) # 0,7 = 1, --- , m, then B(6) ¥ 0. 

In the application we want to make of the corollary, the set © is w. Further- 
more, we shall assume the s,; of section 3 to be algebraic functions of the 6; , 
for @¢w. Then for each 7 there is a polynomial P; in s; and the 6; such that 
Psi 3 01, °°: , &) = O if 6. We shall further assume that the A,(@) are 
~ 0 if 6¢w. These conditions will be satisfied in particular if, for each 7, s; on 
w is a rational function of the 6; , with nonvanishing denominator. 

THEOREM 2. Suppose a family of regular exponential densities is given by (3), 
with 0 € w; w is a subset of a k-dimensional Euclidean space, with k < m; on w, 
the m functions s; are algebraic functions of the k parameters 6; , so that 


Psi 5A, a » %) = 0 


for some polynomial P; (i = 1, +--+ , m); A,(@), the leading coefficient of P; as a 


polynomial in s;, does not vanish anywhere on w for any t. Then T is boundedly 
incomplete for w. 

The proof follows immediately from the constructibility, by the D method 
of section 3, of a bounded function g, g # 0 on a set of positive probability, 
satisfying Eyg(T) = 0 for all 6 & w. 

In both examples in section 4 the s; are rational functions of the 6; , with 
nonvanishing denominators, and in both cases k = m — 1 < m, so that The- 
orem 2 applies. This provides another proof of the well-known fact that in the 
Behrens-Fisher problem, as well as in the problem of testing the ratio of mean 
to standard deviation in a normal population, the minimal sufficient statistic 
is boundedly incomplete. 

It would be interesting to know how much the assumptions of Theorem 2 
can be relaxed. It is certainly not necessary that the s; be algebraic functions of 
the 6;, for, if m = 2,k = 1, % = cos 0, s. = sin 6, then sj + s; — 1 = 0,as 
a realization of (4), so that the D method applies. It is not even necessary for 
incompleteness that there exists a polynomial P in the s; which vanishes for 
all @ € w, as the next example will show. Take m = 2, k = 1, 3; = —In@, s = 
—In (1 — 6), with 0 < 6 < 1. Instead of (4) we have a transcendental equation: 
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exp [—s,] + exp [—s.] — 1 = 0. With help of this equation one can easily con- 
struct functions F of the kind mentioned in section 3. For example, the func- 
tion F whose 2-dimensional Laplace transform is 
7 1 8) 85 ays bys 98 bese 
L(F)(s; , 8) = — (e' +e * — Ile mig “(eg °O? — @ ”**) 


81 S2 
is bounded between —1 and 1, vanishes outside the rectangle 
asthsh+4+1, @® St Sh+1, 


and has vanishing Laplace transform for all 6 between 0 and 1. On the other 
hand, the fact that k < m is not sufficient for bounded incompleteness, nor is 
the additional restriction of analyticity of the s; sufficient. The following ex- 
ample is due to L. J. Savage (private communication). In (3) choose m 2, 
k = 1, 8; = 6 cos 8, s = 6 sin 0(0 > O), A(t) = 1 for t in some square, h 0 
otherwise. Here the s; are analytic functions of 6, but yet it can be shown that 
the family of distributions is complete. Another example is due to D. L. Burk- 
holder (private communication) and differs from Savage’s example only in that 
8 = 6 cos (1/6), s: = 6 sin (1/6). This example is a little less regular than Sav- 
age’s example, but on the other hand the completeness of the family of distribu- 
tions is easier to show. 


6. Completeness of the )) method in the case of a hypothesis concerning the 
standardized mean of a normal population. In this section it will be shown that 
in Example 2 of section 4 all similar tests can be generated by the D method, 
provided the D method is defined in a sufficiently broad manner. That is, we 
want to show that for each similar test ¢ there exists a function G satisfying 
(19) and certain other conditions. In section 3 the functions G were restricted 
to some m-dimensional cube on which h is bounded away from 0 but it was 
remarked there that this restriction is not necessary. We shall not even demand 
that G = O whenever h = 0. In fact, the main thing of importance was the 
validity of (9), and even this we shall relax slightly in the problem under con- 
sideration. 

Equation (19) can be put in the form 
(: 1 ‘) ,_ V2e 
(21) - _— 3 a09 G = ¢ 

dt, 2r5 Ots ro 
where ¢ is defined by 
22) e(t) = —(V8xr) h(t)(o(t) — a). 


Equation (21) can be considered as the heat equation in one dimension, if ¢, is 
interpreted as time, f2 as position, G as temperature, and (4/2x/r)¢ as a heat 
source, capable of producing both positive and negative heat, whose strength 
and spatial distribution varies with time. If this were an actual heat problem, 
its solution could be written down at once, employing the usual Green’s fune- 
tion for the heat operator: 
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9 /.9 
' 2 To (te — te)” ’ 
(23) G(h,t) = [ [ou — t) '* exp| - = —— | dt; dts 
~ a 1 
where the integration is over the strip 0 S t, < t,. Since A(t’), and therefore, 
° fo ° " , Tn 
g(t’), is zero unless tg S ty , we may integrate over to” < t; S t,. The ques- 
tion to be answered next is whether, and if so, in what sense, the formal solu- 
tion (23) to (21), and therefore, to (19) is a representation of ¢. 

We shall at once study the power function of any similar test ¢, since some of 
the results are needed in section 7. Let 2 and w be as defined in section 4, Ex- 
ample 2. We shall assume 7 > 0. As remarked in section 4, the statistic 

7 = (T; ’ T2) 
is sufficient for Q, and it suffices therefore to consider test functions @ which 
depend only on 7’. The power function of ¢ is B(r, ¢) = E,.¢(T,, Tz). Suppose 
@ satisfies (19), then we get after substitution: 


B(r,c) = a + clr, oc) | | 


o* ‘ 7 
exp | - ; ” hi + a |(; i 2ro . .) G(t ’ te) dt, dt. 
2o" o dls Ot, 


(24) 


where the integration is over0 S 4 < *%,—x < t, < «©. We may effect this 
integration by taking the upper limits on ¢; and ¢ as A, B respectively, and then 
let A — «x, B— in any order. With respect to the types of functions G to 
be considered it will not be necessary to do something similar with the lower 
limit on tf . If the upper limits on ¢; and ft, are A and B, one can integrate by 
parts, obtaining an integral 


2. gf B 
(25) ——— [ dy | G(t, &) exp | - os i+ | dt, 
a “0 Lae OD ao o 
plus the following integrated terms: 
B 
(26) ones 2r; G(A, ts) exp | - a A +- al dt. 
Lan =o o 
™ rr, 1 r 
24) a G(t, , B) exp — 5 = + ~B dt; 
o 0 so o 
A 7 2 
(28) I ac (t : I ) exp | - 3 ty + : B| dt, 
0 Oly 20° o 


There is also an integral involving G on the #-axis. For any G given by (23), 
G(0, t2) = 0, so that the integral mentioned in the preceding sentence vanishes 
trivially. It is sufficient, then, to consider only functions G which vanish if 
t; = 0. Now if G is given by (23), with ¢ defined by (22) and ¢ similar of size 
a, then it can be shown that (26)—(28) vanish in the limit if we let first B—- « 
and then A — «. A proof is given in Appendix 2. Using (24) and (25) it follows 
that 
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2. 3 A Bb 
B(r, co) = a + c(r, o) ; - lim lim [ dt, G(t,, te) 
(29) oa A»~0 Boo 0 0 , 
- exp | -; =i t+ : | dt» 
av o 


We see from (29) that 8(ro , «) = a identically in a, as it should. 

The reason we could get the power function in the form (29) is that in this 
problem the density of 7 is of the exponential form (3) on the whole of Q. The 
exponent of the exponential factor is —,/(20°) + rto/o, so that on 2 we have 


s = 1/(20°), s. = —r/o. The polynomial (16) is now defined on the whole of 
QO: 
(30) P(s) = 8 ie Or? “a f= 


(On w, r = ro, So P = O as it should). We made the integrated terms (26)—(28) 
vanish by taking limits in a special way. This suggests, for this problem to re- 
define the 2-dimensional Lap!ace transform as follows: 


A B 
(31) L£(F)(s; 9 So) => lim lim | dt; F(t; 9 to) exp [—s, ti — 82 to| dts 
0 0 


A~x Bon 


With P and & defined by (30) and (31), we have proved that if ¢ is similar, and 
G is the corresponding function given by (23), then (9) is valid on the whole 
of 2. Adding a to both sides of (9) then produces (29). 

In order to characterize the whole class of similar tests, consider the class @ 
of functions G defined on the right half (¢; , f2) plane which satisfy the following 
conditions (with D defined by (17)): 

(i) DG(h,b) =0if b> 4, 

ii) —as (4 — &) “"DG(t,,b) Sl —affG Sh, 

Gi) G = Oif 4, = 0, and G(t, , tb) ~ 0 as t — — ~~, for each ¢t, , 

(iv) The integrals (26)-(28) approach 0 if we let first B — ~« and then 

Amo, 

For every similar size a test function ¢ there is, by (23) and (22), a viique G, 
satisfying the conditions (i)—(iv), so that G ¢ @. Conversely, for any G ¢@ we 
have shown that @ given by (19) is similar and of size a. Thus, there is a one-to- 
one correspondence between the members of @ and the similar size a test fune- 
tions. The class @ gives therefore a complete characterization of the similar tests. 
Unfortunately, condition (iv) is not a very easy one. There is an important 
subclass of @ where (iv) is obviously fulfilled, consisting of those functions @ in 
@ which vanish identically for tf. > Wt, . This is the case, for instance, with all 
functions G leading to an invariant test. For a proof of this fact see Appendix 
3. It would be desirable if (iv) could be replaced by a simpler condition. The 
possibility is not excluded that conditions (i)—(iv) imply that G(t, , f2) = O for 
all tg > /t;, but whether this is so is an open question. 
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7. Some remarks on the search for an optimum test in the problem of section 
6. Consider the class @ defined in section 6. Let $; , ¢: be two similar size a tests, 
G, , Gz the corresponding functions in @, and 6; , 82 their power functions. It 
follows from (29), since r° = ro, that if G; = G., then 6, = f:, so that q is 
uniformly more powerful than ¢:. Since every similar ¢ has a representative 
G ¢@, if there would exist a Go ¢ @ such that G) = G for every G ¢ C, then the 
test function ¢ corresponding to Gy would be UMP (uniformly most powerful) 
among all similar tests. To decide whether or not such a dominating function 
Gy exists, the following observations may be of help. The first observation is 
that in the problem under consideration every invariant test—-that is, depend- 
ing only on 72/+/7;—is similar. Secondly, if we denote by @* the subclass of 
€ representing invariant tests, then in @* there is a function G? which dominates 
every G* © @*. The corresponding test function ¢o is therefore UMP among 
all invariant tests. ¢o is nonrandomized, with a rejection region of the form 
te//t; > constant. That ¢} is UMP invariant is a known result [12], obtain- 
able more directly by the observation that T://T;, — T3 has a noncentral 
t-distribution with a monotonic likelihood ratio [3], [4], [7]. The third observa- 
tion we want to make is that if the dominating function Gp exists, it has to coin- 
cide with GJ . This follows from the following proposition: Jf a UMP similar 
test based on T exists, it is necessarily invariant. The analogous statement, with 
“similar” replaced by “unbiased,” is well known [5], [6]. In fact, both state- 
ments are special cases of the following more general theorem, due to E. L. 
Lehmann (private communication): Let G be a group of transformations which 
leaves the problem invariant, and let ® be a class of tests which is closed under G. 
If there is a unique UMP test in &, it ts almost invariant. (The uniqueness is 
understood to be a.e.). The proof of this theorem follows the same lines as in 
the special case that ® is the class of unbiased tests of fixed size. In our problem 
® is the class of similar tests of size a, based on 7’. & is clearly closed under G. 
If there is a UMP test in &, its uniqueness follows from the completeness of 
T for Q. Finally, in our problem an almost invariant function can be shown to 
be invariant (see also [17], footnote 3, and [5}). 

The conclusion drawn from the preceding discussion is that there is a domi- 
nating function Go ¢ @ if and only if Gt is the dominating function. Whether 
or not this is so is still an open question, and consequently, it is still unknown 
whether a UMP similar test exists. A last remark may be added to this. As 
remarked in section 6 and proved in Appendix 3, the functions G* in @* have 
the remarkable property that they vanish for = +~/t. This property holds 
then in particular for G} . Taking into account that G eC => — aG « @ for suffi- 
ciently small a > 0, we conclude that if Gt is a dominating function in @, then 
every G ¢ @ must also have the property G(t, , &) = O if h = Vth . If this were 
indeed true, then condition (iv) in section 6 could be replaced by the much 
simpler condition G(t,, 2) = 0 if t = Wt. However, as remarked in section 


6, even this property has not yet been proved. 
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Appendix 1. Proor or THEroreM 1 (Seidenberg). For the purpose of this proof 
we shall replace the s; by z;. Let P; = A(@)xti + ---. Let d = max {d;}. 


2 
C 


Multiplying P; by «7 “‘, we may suppose all the d; equal. Multiplying P; by 
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Ag. +++. Am, Po by AsAz. +++. Am, ete., we may suppose all the A; equal. 
Y d . . e 
Now we have P; = A(6@)xi + ---,7 = 1, +--+: ,m, where A is some polynomial 

in 6, °°- » Oe 


Suppose we have a congruence of the form 
Maja" = R(x, 6) mod (P,, --- , Pm) 


(i.e. the two sides are equal whenever all ?,; vanish), with R a polynomial in 
the x’s and @’s. Let M = max {degy P;}. The left hand side has degree in the 
é’s at most Mr, . Assume this to be the case also for R(x, 6). Assume further 
that deg,, R(x, 0) S d — 1,% = 1,--+-, m. Multiplying the congruence by 
x)A, on the left we get a power product of degree 1 + Zr, in the x; times A'*>”. 
On the right there possibly appears a power « ‘: if so, we replace Ax’ by 


(Axi — P,) mod P;. 
In this way we get a congruence 
Irs'A*** = R’(x, 6) mod (P; Jaa See 


with Zs, = 1+ =r,,deg,, R’ Sd — 1 (i = 1,--- , m), degs R’ S M¥s;. The 
congruences 


rid’ = A* "(x5 — P;) mod (P;, --- , Pa) 


are of the above form. Multiplying by various power products of the x,;A, we 
again get congruences of the stated form. Let s = s = m(d — 1) + 1. Then 
any power product of the x; of degree s must have a factor z{ for at least one 
1. Hence we can get a congruence of the desired form with any power product 
of the 2,4 of degree s on the left. For any such power product there may be 
several congruences: choose one. 

For a fixed integer y 2 8 (to be determined in a moment), we consider all 
the power products of the 2,4 of degree between so and y; and all the congruences, 
one for each power product. We still multiply each of these by an appropriate 
power of A so that A’ is the power of A occurring on the left. On the right, then, 
all polynomials are of degree < My in the @’s and of degree Sd — 1 in each 2; . 

Let N(p, q) be the number of distinct power products of degree p or less in qg 
letters. ‘Then N(p, q) = (p+ q)(p+q-—-—1)---(p+1)/q! We are con- 
sidering, then, N(y, m) — N(so — 1, m) congruences. The right hand sides of 
these congruences are linear combinations over K of power products of degree 
=< My in the 6’s and of degree <m(d — 1) in the 2’s; therefore in at most 
N(My, k)N(m(d — 1), m) power products. Since 


deg, [V(y, m) — N(so — 1, m)] = m > k = deg, N(My, k)N(m(d — 1), m), 
we see that for sufficiently large y, 


N(y, m) — N(so — 1, m) > N(My, k)N(m(d — 1), m). 
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Let y be taken large enough for this to be realized. Then there exist ¢;,,...,:,, 
K, not all = 0, such that 
A’>c,, peeing >. Sis Omod (P,. -«- . Pz). Q.E.D. 
Appendix 2. It will be proved here that the integrated terms (26)—(28) vanish 


in the limit B — «, then A — ~. Since @ is similar and of size a, the function 
¢ defined by (22) has the property 


we pasts | ry 
(32) | | g(l, te) exp| —~~ 4 + —b | dt dh 0. 
4/4, 20° 0 @ 


This property is crucial for showing (26) — 0, but is not needed for (27) and (28). 

We shall first treat (26). Since o is an arbitrary positive number, we shall give 
the proof with o replaced by ra/ro, which will be useful for later purposes 
With this change we substitute (23) into (26) anu get 


B 2 
r » A ‘0 
| G(A, ty) exp | —" —+- | dts 
x ot o 
- A = ’ pV t! , , , - , 1 
= exp | —" 5 | | dt, | e(ti, t) dt | (A — t;) ~ 
r £0 0 J t x 
. ro (te — ts) b | dt 
- exp] — + ie 
" @as-n ’ @ 


/5 2) Al s4 , pve, , 
= Vor exp (1 — | [ dt, [ _ lt , ty) 
To r°f 2» “0 \ t 


J— 


B 


l , "0 , / ro , 
° exp | - 3 ty + ; | at, | \ 3 (A = t,) . 
ao o = / at 


ro (te — (A — th)/or — &)° 
-exp| —>s ; dt 
2 A-t 
The integral over ft can be written 


i B’ 1 P 
7x | exp] — 5 z idz, 


B =A — tb)" (B = (A — &)/er — &). 


in which 


As B— x, B’ > ~ and the integral converges monotonically increasing to 1 

ru . . , . . . . 

rhe integration over fs and ¢; can be considered as a double integral of the form 
F , , , . « ‘ ae e 

Sf felt, te) dt; dte, in which f, is bounded in absolute value by 


g(t , ts) exp | - -t, + : | 
_ 26 0 


which is integrable. Applying the Lebesgue bounded (dominated) convergence 
theorem, we may take the limit as B— =~ under the integral. We have 
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; , , . , Vv ry ’ , l ’ fo ’ 
lim fy(ty , to) = [ dt; / v(t; , te) exp | - -ty + : | dts. 
Al 20° o 


Bon ~( 
) ve 


By (32) we may replace the integral on the right in the above equation by minus 
the integral with same integrand but the t; integration running from A to ~. 
Thus we get 
a 2  *. 
lim | G(A, ts) exp | - + f = | dt, = £(A) 
Kon x 


D« 
p* Zar a 


‘9 A : , Vv t? , ; 
&(A) = wr ae exp| (1 ~ a A! dt, g(t, te) 
r rm] 20° 4 Vt 


] , To , ’ 
‘exp! —s ti + — te | dte. 
“o~ o 


qe ° , , . ° 
Since @ is bounded, we see by (22) that ¢(ti, f2) is bounded in absolute value by 


with 


const. h(t), 42), which is bounded by const. ty "?"1' In the integration over ts 


é To ,? ‘ 
| exp ts | dts 
vt’ oa 


is bounded by 2V/t) exp [(ro / o)V ti]. Thus 


- Z ” rin /2 1 ’ v , 
&(A) | < const. exp (1 — a Alf ty exp | -5 ~t) + : Vv ‘| dt,. 
7 ao” 4 “aOo~ og 


We make the substitutions t, = ¢(u +r), A = o kK’, then A and K go to 
x together. Put &(4) = n(K), then 


n(K)| < const. exp (1 — - 7 lf (u + ro)" exp [—}u'] du. 
r Kr, 


we have that 


In the integrand, (w + ro)" can be bounded by const. u", and by partial integra- 
tion one finds that 


| u” exp {— Lu’ du 


“K 


is bounded by const. K"' exp [—3(K — ro)"]. This leads then to 
» 5 _— | ro k* S 
n(K)| < const. A" exp] —~ > + mK 


which — 0 as K > x. Q.E.D. 

Of the integrated terms (27) and (28) we shall only treat (28), since (27) is a 
little simpler and follows the same pattern. It can be shown that (23) can be 
differentiated partially with respect to f2 under the integral sign, provided tj > t; . 
Substituting the result into (28) we obtain, apart from a multiplicative constant, 
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If] i, = 6)" ~ Geld. &) 


(33) ° be 
P o ( — t.)° , , 
° Spi =- 3 ty -f ; B = : B t2) dls dt, dt, 
20° o 24-t 


in which the integration is over the region ." < ¢ < t, S A. We shall show 
that (33) — 0 as B > ~ for fixed A, after which taking the limit A — ~ yields 
then trivially 0. Clearly the integrand in (33) approaches 0 as B — «. It suffices 
therefore to show that limit and integral may be interchanged. By the Lebesgue 
bounded convergence theorem it is sufficient to show that the integrand is 
bounded in absolute value by an integrable function independent of B (but 
possibly dependent on A). Let By > +/ A and consider only values of B = By. 
The integrand is bounded in absolute value by |g(ti, ¢2)\fifefs, in which 


— |: p(B -") i «| -2 (B al .« “) 
Ji ’ o $f 4 — i ’ J2 } oa os t V1, = t . 


and f; = (B — to) *, Now f; is bounded by 


> 2 al / : 
exp ! B a ro (B Vv A) | 
o 4 A 


which is bounded by a constant; fy is of the form y* exp [—(ro/4)y"] and is there- 
fore also bounded by a constant; f; is bounded by the constant (By — V/A) : 
Finally we have then that the integrand in (33) is bounded in absolute value by 
const. o(ts, ts)|, which is integrable over the bounded region . £624 <4. 
Q.E.D. 


Appendix 3. We will show that if @ is invariant, then G = 0 in the region 
b> Vt. Lety = b/Vt, andy’ = tb V ti. If @ is invariant, it is a function 
of 7 only. Put (4, £2) = o*(y), so that by (22) and (18) 


g(t: , t) = const. 4°"? “(1 — 3?) "(6*(y) — a). 
After substitution into (23) and making the change of variable +r = t)/t, 


we can write (23) as 


G(t,, t) = const. t{/* exp | -% i'| 


(34) 
| (1 — ” 7 aie '(p*(y’) — a)fly, y’) dy, 


in which 


1 . 2 ate om Dun 2 
(35) fly, y') = [ ree — ¢) "exp | —! yt — 2yy'Vr + y "lar 


ye l—r 


Throughout we restrict y and ’ toy > 1, y’ S 1. Let the differential operator 
D,, be defined as 
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(36) D, = Sis ’ Sia (n + 1) 


ray » ay 


and the operator D,, similarly by replacing in (36) y by y’. Then / satisfies the 
two equations 


(37) D, f(y, y’) = 0 
(38) Dy, fy, y’) = 0. 


Furthermore, it can be seen from (35) that f — 0 if y — ~ for fixed y’, or if 
y’ — — = for fixed y. Two linearly independent solutions of the equation 


(39) Dyu(y) = 0 

are u; and w% , With w(y) = uw(—y), and 

(40) u(y) = i" ”? exp [— 4t + ro Vt y] dt 
“0 


Whea y — *%, u(y) —~ «© whereas u(y) — 0, with the opposite behavior as 
y — — x. It follows from (37) and (38), from the behavior of the functions u, 
and uw , and from the behavior of fas y— « ory’ —~ —«, that f must equal 


(41) fly, y’) = const. u(y’ )uely). 


Substituting (41) into (34), it remains to be shown that the integral 


1 
(42) | (1 — y”)™? *(o*(y’) — a)uily’) dy’ 
0 


equals 0, with w given by (40). Replacing in (40) ¢ by 4, and in (42) y’ by & V ti, 
the integral (42) is nothing else but the expectation of ¢ — a with respect to 
the distribution specified by r = rm, o = 1. Since ¢ is similar of size a this 
expectation vanishes, Q.E.D. 





A METHOD OF GENERATING BEST ASYMPTOTICALLY NORMAL 
ESTIMATES WITH APPLICATION TO THE ESTIMATION OF 
BACTERIAL DENSITIES' 


By Tuomas 8. Frercuson* 
University of California, Berkeley 


0. Summary. Various minimum x? methods used for generating B.A.N. 
estimates are summarized, and a new method which generates B.A.N. estimates 
as roots of certain linear forms is introduced and investigated. As a particular 
application of the method, the estimation of the bacterial density in an experi- 
ment using dilution series is considered. 


1. Introduction. The purpose of the present paper is to describe a simple 


method by which estimates having the usual asymptotic properties of Best 
Asymptotically Normal (B.A.N.) estimates can be obtained. 

Originally B.A.N. estimates were introduced by J. Neyman [1] for a situation 
in which the underlying probability distributions have a multinomial-like strue- 
ture. This was followed by a paper by E. W. Barankin and J. Gurland [2] who 
extended the class of estimation problems for which B.A.N. estimates could be 
used and also described quite general methods of generation of such estimates. 
Other results in this direction have been obtained by C. L. Chiang [3] and L. Le 
Cam [4] and W. Taylor [5]. 

A best asymptotically normal estimate 6* of a parameter @ is, loosely speaking, 
one which is asymptotically normally distributed about the true parameter 
value, and which is best in the sense that out of all such asymptotically normal 
estimates it has the least possible asymptotic variance. Thus a B.A.N. estimate 
will be asymptotically the ‘‘most accurate” estimate of a parameter; but the 
value toa statistician of obtaining such estimates is even greater than is indicated 
by this. In the aforementioned paper of Neyman, a simple method of testing 
hypotheses is described which is asymptotically equivalent to the likelihood 
ratio test and involves the use of the x? statistic and a B.A.N. estimate. It 
usually turns out that the hardest work in applying this technique is in comput- 
ing the estimate. Thus it is important to have a number of different methods for 
computing B.A.N. estimates available to the applied statistician. The usual 
methods of obtaining B.A.N. estimates will be summarized briefly in section 2. 

The objective of all these methods is at least in part a practical one and is 
essentially two-fold. First, it is hoped that some of these estimates will be 
easily computable. Second, even though all these estimates have the same 
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asymptotic properties, they may differ widely in their small sample properties, 
and it seems reasonable that the choice of the proper estimate to use should 
depend in part on the behavior of the estimate for small samples. As a conse- 
quence, a large class of estimates with best asymptotic properties is proposed 
with the hope that some of the easily computable estimates will have small 
sample properties which are reasonably good. Blind adherence to the principle of 
maximum likelihood, for example, may lead to more difficult computations and 
still yield less accurate estimates than other methods of estimation. 

A new approach to generating B.A.N. estimates as roots of linear forms of 
certain variables is suggested in section 3. In cases where minimum distance 
methods are applicable, the procedure proposed here leads to estimates which are 
solutions of equations obtained by simplifying in a suitable manner the equations 
obtained by the original methods. By way of an example, section 4 contains an 
application of this approach to the problem of estimation of bacterial density. 


2. A review of the minimum ,’ methods of generating B.A.N. estimates. 
Since the following methods are to be found in the literature at various levels of 
generalities, a complete mathematical description of the hypotheses necessary 
for their validity will be omitted. 

Let X, ,X2,---,N,,°-°* bea sequence of independent identically distributed 
s-dimensional random vectors whose distribution depends upon a parameter 6 
belonging to an open subset © of k-dimensional Euclidean space Rk, with k S s. 
Let P(0) = ECX | 6) be the s-dimensional vector of the expectations of the vector 
X, , and let 5(@) = var(X | 6) = E{[X — P(@)] [X — P(@)]'} be thes X s co- 
variance matrix which is assumed to be finite and non-singular for each @¢ ©. 
Furthermore, it is assumed that P(6) is a one-to-one bicontinuous map from © to 
a subset of s-dimensional Euclidean space with continuous partial derivatives of 
the second order. Let Z, be the s-dimensional random vector defined by nZ,, = 
m7, X,. 


The quadratic form 


(2.1) n|Z, — P(6)|' =(6)" [Z, — P(@)| 


will be designated by the name of x*. The value 6(Z,,) of @ which minimizes this 
quadratic form will be called the minimum x° estimate of 6. An as example take 
the multinomial case where there are n independent trials each capable of pro- 
ducing any of s + 1 possible outcomes. Let the probability on each successive 
trial be p,(@) of producing the 7th outcome. Let z; denote the proportion of the 
trials which result in the 7th outcome. Then 


: (2; — pi(@))” 
i=l pi(@) 


(2.2) . n 


is the familiar Pearson x. It may be shown that (2.2) is algebraically equal to the 
x of the form (2.1) where the vector Z, is the vector of the first s z,’s. The ad- 
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vantage of (2.1) lies in the fact that it describes a method for estimating pa- 
rameters of continuous distributions. 

Barankin and Gurland [2] have shown that the minimum x’ estimate, as de- 
fined above, is B.A.N. where the X, have distributions belonging to a Koop- 
man’s family, and Z, is a vector of sufficient statistics. When the distributions 
under consideration do not form a Koopman’s family with sufficient statistics 
Z,, the term B.A.N. estimate is perhaps not entirely justifiable but will be 
retained for convenience. The precise definition of B.A.N. estimate to 
be adopted is somewhat irrelevant, because the methods reviewed in this sec- 
tion and the method developed in section 3, give estimates which have the same 
asymptotic behavior as the minimum x estimates. In section 3.3, the sense in 
which the estimates are best is stated more precisely. 

Starting with this basic minimum x estimate, several methods may be used 
to generate large classes of estimates. These methods will be described below. 
Method I is due essentially to Karl Pearson. Method II as a general method 
may be found in Barankin and Gurland [2] and Taylor [5], but special cases 
were used earlier (see Berkson [6]). Method III evolved from practical work 
and is of unknown authorship. Method IV is due to Neyman [1]. 


Method I. Modification. Let M,(Z, , 6) be an s X s symmetric positive 
definite matrix. The quadratic form 


(2.3) Q,(0) = n{[Z, — P(0)\'M,(Zn, 0)[Z, — P(0)| 


will be called the modified or reduced x”. The estimate 6y(Z,) which minimizes 
the modified x’ with the function M,(Z, , 6) depending only on Z, and not on 
9 or n, will be called the minimum modified x° estimate of 6. For example, the 
estimate which minimizes the Pearson modified x’ , 


— (2; — P,(6))° 


i=l «i 


(2.4) Xu =n 


is such an estimate. 

Under the condition that M,(Z, , 6) ~ =~ '(@) in probability as n — * when 
6 is the true value of the parameter, and under certain regularity conditions, 
the minimum modified x° estimate of @ will have the same asymptotic proper- 
ties as the minimum x’ estimate of @ (or simply 6y(Z,) will be B.A.N., accord- 
ing to the conventions made.) 


Method II. Transformation. Let g(x) be any function from FR, to R, with 
continuous first partial derivatives 


g(x , *e* 5 Zap 
(2.5) g(x) = : 


wie. <*> a) 
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Let the s X s matrix of first partial derivatives be denoted by 


(a Q 
« ¢ eee Gs | 
| ax, 7 ax, ? | 
(2.6) gir) =: | 
ee 0 eae 
| OX, m1 OX; 9 ) 
We shall call the quadratie form 
(2.7) nlg(Zn) — g(P(0))\'[g(P(0))2(0)g(P(8))'T 'Ig(Zn) — g(P(8))] 


the transformed x°. More generally, we may consider the combinations of 
Methods I and II. and replace the matrix of the quadratic form (2.7) by an 
estimate 


(2.8) Q,(0) = nlg(Zn) — g(P(6))I'M,.(Zn , O)[g(Zn) — g(P(8))! 


We assume that M,(Z, , 6) — [g(P(@))2(0)g(P(@))’| ‘ in probability and the 
regularity conditions needed for Method I. In addition, one needs regularity 
conditions on g, namely that g is a one-to-one bicontinuous map from a neigh- 
borhood of P(@) into R,, with continuous partial derivatives of the second 
order and that the matrix g(/(@)) is nonsingular for each @ ¢ ©. Then the mini- 
mum transformed x°* estimates, that is the value 67(Z,) of @ minimizing (2.7), 
will be a B.A.N. estimate of 6. 

This method of generating B.A.N. estimates also applies to the x° of (2.2); 
for example, letting g.(7) be the real-valued transformation applied to the ith 


cell 


~ (gil2.) — gilp.(0))) 


2.9) = 
x iol pi(9)gi(p.(8)) 


or modified, 


s+1 » 
(2.10) wn > (g.(z:) = ete . 
i=l Zi Oyheiy- 


The well-known example of Berkson |6] is of the type (2.10). 

Many times the functions g; may be chosen so that g,(p,(@)) is a linear func- 
tion of the parameters @,;, --- , 6. In such cases finding the value of 6 which 
minimizes the x° of equation (2.10) results in solving k linear equations in k 
unknowns. The reader may consult the paper of W. Taylor [5] for examples. 


Method III. Expansion ina Taylor series about a 0(\/n)-consistent estimate. 
An estimate 6% of @ will be called O(4/n)-consistent if 1/n(@% — @) is bounded 
in probability uniformly in n when @ is the true value of the parameter; that 
is, for every « > 0 and 6 ¢ 0, there exists a number B so large that for every 
n=1,2,;-- 
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(2.11) Pl/n| 0% — 0|> B\O<e. 


Many types of estimates satisfy this requirement. For example, under certain 
regularity conditions, estimation by the method of moments yields estimates 
6% for which +/n(6, — @) is asymptotically normal when @ is the true value 
of the parameter. This follows from a theorem of Cramér [7], p. 366, which 
states that certain functions of the moments are asymptotically normal. Such 
asymptotically normal estimates as this are obviously O(4/n)-consistent. 

One may try to apply a correction to 6% by an application of the method of 
expansion in a Taylor series to get an estimate closer to the minimum x’ esti- 
mate. It is known, however, that one such application to a O(4/n)-consistent 
estimate will give a B.A.N. estimate. More specifically, consider the expansion 
of some one of the previously mentioned x°’s (modified and/or transformed) in 
a Taylor series to the second degree terms about a O(+/ n-consistent estimate 
6* of 0. 


(2.12) x°(@) = x°(0%) + x°(0%)'(6 — 0.) + 4(0 — O%)'x°(0%)(@ — 0%) + Rem. 


where x°(@) is the k & 1 vector of first derivatives of x’(@) and x°(0) is the k & k 
matrix of second derivatives of x°(@). 


( 0 2 
—- > (6) | 
| 00; x ( ) | 
(2.13) (0) = | | 
0 2 | 
on of 40) | 
| 00, x ) 
( a” 9 a > 
~x (6) --- - “(6 
| ae? x (8) 30, 0, * ' ) 
(2.14) x (0) = | ; 
oO 2 0 2 
| | — “(A 
00, 0); x 6) Oe? x (9) 


Instead of finding that value of @ which minimizes x°(@), one may discard the 
remainder term and find that value 6, of @ which minimizes the first three 
terms of the expansion. This estimate 6, will then be a B.A.N. estimate of @. 
This method of generating B.A.N. estimates is important because it leads to k 
linear equations in k unknowns and is thus comparatively easy to apply. 


Method IV. Linearization of the side conditions. This method, due to Neyman 
[1], was proposed with the specific intention of finding a B.A.N. estimate which 
could be computed by solving linear equations. In minimizing some x’ like (2.1), 
one may consider the vector P as the vector of parameters which are subject 
to certain restrictions, called side conditions, due to the dependence of P on @ 
If there are s independent components of the vector P and k parameters, there 
will be s — k side conditions on the p’s. 


(2.16) Slti,+*:, Pe) =O for j=1,---,k—s 
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One may then minimize x° subject to these side conditions by the method of 
Lagrange multipliers. However, a simpler procedure would be to minimize x° 
subject to the linearized counterpart of (2.15), that is, the first two terms of 
the Taylor series expansion about the point z,. The solution for the estimate 
then only requires solution of linear equations. For a fuller account of the sub- 
ject, the reader should consult the papers of Neyman and of Barankin and 
Gurland. The outline of the method given here is added only for the sake of 
completeness and no mention of the method will be made in the later sections 
of the paper. 


3. B.A.N. estimates as roots of linear forms. The method customarily used to 
find a minimum x? estimate is to differentiate x? with respect to each of the 
parameters separately, set the results equal to zero and solve the resulting system 
of simultaneous equations. For example, one may differentiate the x* of the 
equation (2.4) and obtain the equations 


in, <= aul 
(3.1) - >, & 9)) dp. (0) oe 


for j = 1,2,---,h, 
ra Zz; 00; 0 J , ’ ’ 


or one may differentiate the x° of equation (2.3) with M,(Z, , 6) a function of 
Z, only, such that M,(Z,) — T(@) in probability and the regularity conditions 
hold, and obtain 


(3.2) —nP(0)M(Z,)(Z, — P(@)) = 0 
where P(@) is the k X s matrix of first partial derivatives of the vector P(@), 


(aP\(@)  aP.(6) 
| 0 —t«Y 
(3.3) P(e) =| : 
| aP1(8) aP.(6) 
i —-* 


and the 0 is the k X 1 vector with a zero term in each component so that (3.2) 
represents k equations in k unknowns. 

Well-chosen roots to equations such as (3.1) and (3.2) are B.A.N. estimates 
of the unknown parameters. This suggests that instead of starting with a quad- 
ratic form in (Z, — P(@)) and finding values of 6 which make the form a mini- 
mum, it may be simpler to start with an arbitrary linear form in (Z, — P(@)) 
and find the roots. Roots of certain such linear forms, namely, (3.1) and (3.2), 
will be B.A.N. estimates. Furthermore, such a method of generating B.A.N. 
estimates will probably satisfy the requirement that they be easy to compute. 
It is the purpose of this section to investigate the asymptotic distribution of 
roots of linear forms in (Z, — P(@)), and the conditions for such roots to be 
B.A.N. estimates of the parameters. 


3.1. Preliminary lemma. This section contains an implicit function theorem 
needed for the proof of the main theorem. First an implicit function theorem 
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which can be found in Pierpont [8], p. 293, for example is stated, from which 
the lemma of this section will follow. The unicity of the implicit function is stated 
in a somewhat stronger form than found in Pierpont. This strengthening can be 
obtained by modifying his proof slightly and the details of the proof need not 
be given here. 

Let F(x, u) be a function of variables x ¢ R, and u ¢ R, with values in R, . Let 
aeR, and b ¢ R, , and assume that 

(i) F(x, u) is continuous and F,(x, u) exists and is continuous in a neighbor- 
hood of the point (a, 6). 

(ii) F(a, b) = 0 and F,,(a, b) is nonsingular. Then, there exists a neighborhood 
N of a, and a function o(x) from R, to R, , such that 

(1) d(x) zs continuous in N, 

(2) ¢(a) = b, 

(3) F(x, o(x)) = 0 for x e N, and 

(4) (uniqueness) there exists a neighborhood N’ of the point b such that for 
uéeN’ andzeN, F(x, a) ¥ 0 unless u = (2). 

In the above theorem F(z, u) represents the k X k matrix of partial deriva- 
tives of F(x, u) with respect to u, as in equation (3.3). The assumption of con- 
tinuity of F(z, wu) means that each component of the matrix is assumed to be 
continuous. 

The following lemma is an extension of this theorem, similar to that found 
in Graves [9], p. 144, to the situation in which F(z, u) is known to vanish along 
some curve in F&,4, , rather than just at one point. 

LemMA. Let F(x, u) be a function of variables x ¢ R, and u ¢ R, with values in 
R,,k S s. Let p(u) be a function from some set D C R, to R, , and assume that 

(i) D is an open set, 

(11) p(u) ts one-to-one and inversely continuous from D into R, , 

(iii) there is a neighborhood of the curve {(p(u), w):ue D} in which F(x, u) is 
continuous and F(x, u) exists and is continuous. 

(iv) F(p(u), u) = 0 and F,(p(u), u) ts nonsingular for every u e D. 

Then, there exists a neighborhood N of the set S = |p(u):ue D} and a fune- 
tion o(x) from R, to R, such that 

(a) (2) is continuous in N, 

(b) o(p(u)) = u for ue D, 

(c) F(x, o(x)) = 0 for x ¢ N, and 

(d) there exists a neighborhood of the curve {(p(u), u):ue D} in which the 
only zeros of the function F(x, u) are the points (x, $(x)). 

Proor. From the previous implicit function theorem, for every u ¢ D, there 
is a neighborhood N,,,. of the point p(w) and a function ¢,(x2) from R, to Ry 
such that 

(1) ¢.(x) is continuous in N,,, , 

(2) d.(p(u)) = u, 

(3) F(x, d.(x)) = 0 for x ¢ Ny.) , and 

(4) for y in some neighborhood N,, of the point u, and x ¢ Ny.) 


F(x, y) #0 unless y = ¢,(2). 
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Using the inverse continuity of the function p(u), and the continuity of the 
function ¢,(2), we may replace the neighborhoods N,(u) by spherical neigh- 
borhoods Nj...) with the two additional properties that 

(5) if p(w) e€ Note for some uw and u2 € D, then u ¢ Nu, 
and 

(6) ifze Nits for some u € D, then ¢,(x) ¢ N,. 

Now consider spherical neighborhoods N%,,.) with radii equal to 14 that of 
Nj , and let N denote Uy.pN’r,.).. The set N is then obviously a neighborhood 
of the set S. 

We will show that if xo € Now, n Novus) , then ¢.,(20) = ¢u,(%o). For since 
Now) n Now) is not empty, either p(w) € Nous) or p(u2) e Now) - Suppose 
without loss of generality that the former is true; then since w ¢ N,, and 


F(p(w), ou, (p(%))) — 0, 


+ ” yf . 
we have ¢.,(p(m)) = wu. Furthermore, for x ¢ Np...) A Npw.), Ou,(x) is con- 
. . . , Tr , 
tinuous and satisfies F(x, ¢.,(x)) = 0; but ¢.,(7) eN., for x ¢ Np.) and thus 
. . . . . aaa 
¢.u,(x) is the unique function, continuous in N,,,,) and such that 


du,(p()) = uw and F(x, bu,(x)) = 0. 


Hence, ¢.,(%0) = du,(2o). 

Thus for « ¢ N we may define ¢(z) = ¢.(x) for any u for which z ¢ Now), 
since such a definition is unique. Now parts (a), (b), and (¢) of the conclusion 
of the lemma are obvious. As for (d), the neighborhood can be chosen to be 
UveolN om X Nal. 


3.2. The main theorem. Let Z, ,n = 1, 2, --- be a sequence of s-dimensional 
random vectors whose distribution depends upon a parameter @ in some set 
Oc k,.,k Ss s. Let P(6) be a function from © to R,. 

ASSUMPTION 1. © its an open set. 

AssumpTIon 2. £{+/n(Z, — P(@))| 6} — £(Z) where Z is a normal random 
vector with mean zero and variance-covariance matrix =(6). (That is, 


EZ =0O, EZZ' = =(8).) 


The convergence used above is convergence in law or in distribution. As- 
sumption 2 states that when @ is the true value of the parameter, the distribu- 
tion of ~n(Z, — P(@)) converges to a normal distribution with mean zero and 
variance-covariance matrix (6). The law degenerate at some point a will be 
denoted by £(a). Thus £(X,) — L£(a) means that X, converges in probability 
to a. 

AssuMPTION 3. The mapping P(@) from © into R, is homeomorphic (that is, 
one-to-one and bicontinuous) and continuously differentiable. 

Let f(x, 6) be a k X s matrix for each x ¢ R, and 6¢ 90. 

AssuMPTION 4. There is a neighborhood No C R, X © of the set 


{(P(8), 0):8 € O} 
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within which f(x, 0) and 0/06;f(x, 0) for j = 1, 2, +++, k are continuous jointly 
in (x, 0). 

Let b(@) = f(P(@), @) and let P(@) be the k & s matrix of partial derivatives 
of P(@), given by equation (3.3). 

Assumption 5. The matrix P(0)b(0)' is nonsingular for each 6 ¢ ©. 

Let 


(3.4) F(x, 0) = f(x, 0)(x — P(@)). 


This is the linear form which will be used in the sequel to generate B.A.N. 
estimates of the parameter 6. The following theorem shows immediately that 
the root to the equation F(Z, , 6) = 0 will be a O(4/n)-consistent estimate of 
0. 

THEOREM 1. Under assumptions 1 through 5, there exists a neighborhood N of 
the set S = {P(0):0¢€0O} and a unique function 6(x) from R, to Ry continuous 
in N, such that 6(P(0)) = 6 for 0&0, and F(x, 6(x)) 0 for x e N. Moreover, 
LiVn(6(Z,) — 6)\ 6! — L(Y) where Y is a normal random vector with mean 
zero and variance-covariance matrix given by 


(b(0)P(@)’| *b(0)=(8)b(0)'[P(8)b(0)| 


Proor. F(P(6), 6) = 0 and 
(3.5) Fi(x, 0) = fo(x, 0)(x — P(0)) — P(0)f(x, 0)’ 


where fe(x, 6) represents the k X k X s cubie matrix of partial derivatives of the 
k X& s matrix f(x, @) with respect to 6. To avoid confusion we will write out the 
first term of this difference completely. Denote the function in the ith row, 
jth column of f(z, 6) by fi;(x, 6), and let Pj(@) and x; represent the jth com- 
ponent of the vectors P(@) and x. Then, 


| eo 0 


- fi; ee - fa; 
00, ° 30, 
(3.6) felx, (a — P(@)) = > ; (x; — P,(0)). 
as 0 f 0 f 
00,° 7 00,°” 


It is now easily checked that formula (3.5) holds. Hence, 
(3.7) F,(P(8), 0) = —P(8)b(6)' 


which, by assumption, is nonsingular for every 6 ¢ ©. Thus the hypotheses of 
the lemma of the previous section are satisfied and the first part of the theorem 
is proved. 

To prove the second part, expand F(x, 6) about the point 6(x) to one term 
using the formula 


1 


— 


(3.8) F(a, 0) = F(x, 6(x)) + | Foix, A(x) + AO — A(xr)} an| (@ — A(x)) 
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which may easily be checked. By the integral of a matrix we mean the matrix 
of the integrals of each term separately. For each @ ¢ ©, formula (3.8) is valid 
whenever x is sufficiently close to p(@), so that (x, 6(x)) is in a spherical neigh- 
borhood of (p(@), 6) contained entirely in No. We may replace x by Z,, in (3.8) 
and multiply both sides by Wn. 


1 , 
Vn | - [ FoiZn, (Zn) + (0 — 6(Z,))} in| (6(Z,) — @) 
(3.9 “0 


= f(Z,,0) Vn (Z, — P(6)). 


We now invoke the theorems of Slutsky (see [10], section 2, theorem 2, or 
[4]). From assumption 1, £(Z, | 6) — L£(P(@)). Hence by Slutsky’s theorem, 
since f(x, @) is continuous in a neighborhood of (p(@), 8), 


(3.10) L(f(Z, , 9)| 0) — L(f(P(0), 6)) = L(b(8)). 
Slutsky’s theorem also gives 
(3.11) L(f(Zn , O/ n(Zn — P(0))| 0) — £(b(0)Z) 


where Z is a normal vector with zero mean and variance-covariance matrix 
=(6). Since £(Z, | 6) — L£(P(0)) and £(6(Z,)| 6) — £(6(P(@))) = L£(6), we 
may apply the Lebesgue bounded convergence theorem to the integral in (3.9). 
ie : a | if } 
L < [ FlZn , (Zn) + (0 — 6(Z,))| dX! 0> £4 [ Fl P(6), 6] dv> 
(3.12) ~~ ) ” 
= £)F,(P(6), 0)} = £{—P(a)b(0)’} 
by equation (3.7). Another application of Slutsky’s theorem allows us to con- 
clude 
(3.13) L1v/n((Z,,) — 0)| 0} — LI (b(0)P(0)'} 'b(@)Z}. 
Denoting {[b(@)P(0)’|"'b(@)Z by Y, we see that Y is a normal random vector, 
with mean zero and covariance matrix 
(3.14) EYY’ = E\b(@)P(0)'\'b(@)ZZ'b(6)'|P(a)b(8)'} 
= [b(@)P(6)') 'b(@)=(0)b(0)'[P(8)b(0)'} *. 


3.3. Applications. The theorem just proved allows some immediate inferences. 
The important point to notice in this theorem is that the asymptotic distribu- 
tion of Wn(4(Z,) — @) depends on the function f(x, 6) only through its values 
along the curve {(P(@), @):6¢0}. Thus if the linear form 


F(Z, , 0) = f(Zn , OZ, — P(0)) 
has a root which is already a B.A.N. estimate of 6, any linear form 


g(Zn ’ 6)(Z, : P(@)), 
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in which the function f(x, @) is replaced by any function g(x, 6) satisfying as- 
sumption 4 and for which g(?(6), 6) = f(P(@), 6), will have a root which is also 
a B.A.N. estimate of 6, since the asymptotic distribution of the two roots will 
be the same. 

For example, equation (3.2) (neglecting the factor n which is immaterial as 


far as roots are concerned) is a linear form of the type f(Z, , @)(Z, — P(@)) for 
which 
(3.15) 12Z,., 8) = P(6)M(Z,.). 


Since M(Z,,) converges in probability to 5(@) ' when @ is the true value of the 
parameter, M(P(@)) = (6) so that 


(3.16) b(0) = P(6)=(6)"' 
Now consider functions 
(3.17) fi(Z, , 6) = 0(0) and f.(Z,, 0) = L(Z,)M(Z,) 


where ZL is a matrix continuous in a neighborhood of {|P?(@):@ ¢ ©}, such that 
L(P(6)) = P(@). If f,\(Z, , 0) is used, we must also assume that 6(@) has a con- 
tinuous derivative. In these circumstances, whenever the root to equation (3.2) 
is a B.A.N. estimate, roots to the linear forms involving f;(Z, , 6) and fo(Z, , @) 
will be B.A.N. also. 

Now we will show directly the exact conditions under which there will be a 
root of a linear form which will be “best”? out of the class of all roots of linear 
forms; that is, the exact conditions under which there is a value of b(@) which 
minimizes the variance (3.14). 

Of two n by n matrices, A and B, A will be said to be smaller than B, in 
symbols A < B, if and only if B — A is positive semi-definite; that is, if 


x[B — Ajlx = 0 


for every n-dimensional vector x. Thus of two unbiased estimates of a vector 
parameter 6, JT; and 7T,, with covariance matrices respectively A and B, 7, 
would be preferred to 7, if A < B, since the unbiased estimate x’T, of the 
parameter x’6@ will have a smaller variance than the unbiased estimate 2’T> of 
the same parameter. 

THEOREM 2. If in addition to assumptions 1 through 5 there exists an s by s 
nonsingular matrix Xo() such that 


(3.18) >(0)E0(0)P(6)’ = P(6)’ 


then the asymptotic covariance matrix of 6(Z,,) taken on its minimum value when 
b(@) = P(0)Z0(0). The minimum value is then [P(0)E0(0)P(0)'\". 

Proor. For simplicity of notation the @ will be omitted. From assumption 
5, P is of full rank so that [PoP] is nonsingular. The inequality 


(3.19) — (b'[Pb’ — oP'PS.P')S(o[Pb' |" — SoP’[PS.P}"') = 0 


which holds since ¥ is positive semi-definite, yields 
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(3.20) [bP’|'b=b'[Pb'|' — [PX.P’)" = 0. 
Yet it is easily checked that equality is attained if b = Po. qed. 

The assumption of the existence of a matrix Yo(@) satisfying (3.18) holds for 
example when 5(@) is nonsingular. Then 6(@) = P(@)=(@)' as was found in 
equation (3.16). However, in other important cases, for example in the multi- 
nomial case with the x’ of equation (2.2), the matrix 2(6) is singular. The fol- 
lowing lemma which may be proved without difficulty, will perhaps be of aid 
in checking whether a Yo satisfying (3.18) exists at all. 

Lemma. In order that there exist a nonsingular matrix Xo(@) satisfying (3.18), 
it is necessary and sufficient that the range space of P(6)' be contained in the range 
space of =(@): that is, for every vector x there exists a vector y(@) such that 


=(6)y(0) = P(@)'z. 
In certain cases one can find the matrix So which satisfies (3.18). We shall 
do it now for the multinomial case. In this case the vector P(@) is simply the 


vector of cell probabilities, and is s + 1 dimensional. The matrix =(@) is found 
to be 


( pi(@) — pi(@) —pi(0)p.(0)  --- — pr(O)p.41(4) 
(3.21) 23(6) = | —Pr(6)p2(0) —_ps(6) — p2(6) | 
| — p(0)p.+1(8) ++ Doyr(0) — pasr(9) | 
which may be expressed simply as 
(3.22) =(6) = B(6) — P(6)P(6)’ 
where B(@) is the diagonal matrix 
po) O --- oO ) 
| ee 1 ) | 
(3.23) B(6) = | ; 
9 oo 
0 pass(0)| 


Then, as suggested by the x’ of (2.2), put 2o(@) = B(@)™. 
(3.24) >(0)S0(0)P(0)’ = B(e)B(0)" P(e)’ — P(6)P(0)'B(e) P(e)’. 


It is easily seen that 


: s+1 7 et+l a s+1 a 
3.25 2(9)’B(e) (P(e) = = (0 — p,(0), ++ a 0). 
(3.25) P(6)'B(@) “P(@) (= 3 7.0, 2 3 v.), » 2 3g, Pil) 


This vector must be zero since }>'2} p.(@) = 1. Hence, the equality (3.18) is 
satisfied. Thus applying Theorem 2, roots of the linear form 
e+1 


(3.26) = (2, “= pi(9))f i;(s1, piel Ss41, 60) => 0 j = 1, 3 eos k 
t=] 


will be “best”? when f;;(p1(8), «++ , Ds4i(0), 0) = 0/06; log p,(8). 
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It may further be shown in the multinomial case, that if the f,;(z, 6) are 
chosen to be independent of z, and equal to 0/06; log p,(@), equation (3.26) will 
b the derivative of the log of the likelihood function set equal to zero, so that 
one has immediately that the maximum likelihood estimate, in addition to the 
minimum modified x° estimate, is an estimate which is given by the root of a 
certain linear form. One would expect that the linear form (3.26) in which the 
functions f;; do not depend on @ at all would be somewhat easier to solve for 6. 
It is this type of linear form which is suggested in section 4 as a method for 
estimating the bacterial density in a liquid. 

We will now apply the preceding theorem to the various minimum x” methods 
discussed previously. 

Application to the transformed x*. The method of generating B.A.N. estimates 
described in Theorems 1 and 2 also applies easily to the transformed x° of equa- 
tions (2.8) and (2.10). For example, the derivative of the x° of equation (2.8) 
with T(Z,) depending on Z, only, and not on @, is found to be 
(3.27) 5 xr = nP(0)g(P(0))T(Z,,)(g(Zn) — g(P(@))). 

Assumption 1 of Theorem 1 becomes in this case 
(3.28) £\-V nlg(Zn) — g(P(8))] | 0} — £(Z) 


where Z is a normal random vector with zero mean and variance-covariance 
matrix [g(P(6))=(@)g(P(@))’|. This may easily be checked by expanding g(Z,.) 
in a Taylor series about the point P(@), and invoking asymptotic normality 
of /n(Z, — P(@)). The only requirement on the function g(r) is that it have 
a continuous derivative in a neighborhood of the curve {P(@):@¢0}. If in 
addition g(P(@)) is nonsingular for each @ ¢ ©, [g(P(@)) =(@)g(P(@))|* will exist 
and 6(@) is found to be 


(3.29) b(0) = P(0)g(P(0))[g(P(8)) S(@)g(P(0))’) *. 

Thus, if the root to equation (3.27) is a B.A.N. estimate, the root to the linear 
form 

(3.30) (Zn , O)(g(Z,) — g(P(0)) = 0 

will also be a B.A.N. estimate, provided that f satisfies Assumption 4, and that 
f(P(@), 0) = b(@). 


The linear form corresponding to the transformed multinomial x” of (2.10) 
may be computed as before. It becomes 


s+1 
(3.31) > (gi(z) — gi(pilO))\ filer, «++, 241,80) = O 5 mt 3B 
t=] 
where 


; ‘) | 
(3.32 f;.(p:(0), «++ 5 Des1(O), 0) = | — p,(@) |- 
we — Pess\0) EB pi(9)g'(p.(9)) 
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Under assumptions 1 through 5, and the assumptions that each g;(x) is con- 
tinuous in a neighborhood of the curve {z:z2 = p,(@), 6¢ ©} and that 


g.(p.(0)) ¥ 0, 


the roots to equation (3.31) will be B.A.N. estimates of the parameters. 

Application to the expansion of x° in a Taylor series. Let 0% be a O(+/n)-con- 
sistent estimate of the parameter 6. To find the minimum value of the right 
hand side of equation (2.12) without the remainder term, we take a derivative 
and solve for the root 6. 


(3.33) 6, = 0% — x°(On)'x"(On) 


If we use the modified x° of equation (2.3) for this procedure with M a function 
of Z, only, for example M(Z,) = 3(6%)”’, the first two derivatives are 


x'(0) = 2nP(@)=(0,) (Z, — P(@)) 
x°(0) = 2nP(0)=(0%) 'P(6)’ — 2nP(0)d(0%)"(Z, — P(@)). 
where /(@) is the k X k X s cubic matrix of second partial derivatives of the 
vector P(@). 
If, on the other hand, we take the linear form with the function f(Z, , @) not 


depending on 6, say to be P(6%)=(6%)', and expand it about 6% to the first 
power and solve for 6, we have 


(3.35) 6, = 0% + [P(0%)=(0%) *P(0%)| 'P(6%)3(6%) (Zn — P(6%)). 


If one compares the estimates (3.35) with the estimates (3.33) with equations 
(3.34) substituted, one sees that the former require less computation, and that 
by the amount in the second term of the expression for y°(@), involving all the 
second partial derivatives of the vector P(@). Furthermore, computation of 
(P(e*)=(0%) *P(e%)]' would give an estimate of the limiting variance-covari- 
ance matrix of the B.A.N. estimate 6, . 

This method would be good for example in estimating the parameters of a 
Neyman type A distribution, where the vector P(@) is a rather complicated 
function of the parameters, and other methods of getting B.A.N. estimates 
are rather difficult. This method has been applied by Robert Read of the Statis- 
tical Laboratory of the University of California, to estimating the parameters 
in a probabilistic model describing ionization in a cloud chamber, using as the 
preliminary estimates, those given by the method of moments. It has also been 
applied by Dr. Irene Rosenthal of the Psychology Department at the Univer- 
sity of California, to estimate the parameters of a latent structure, using as first 
estimates those of Lazarsfeld [11]. 


4. Application to the problem of estimating bacterial density by the dilution 
method. The method of estimating the bacterial density of a liquid by taking 
samples in fermentation tubes at several levels of dilution of the liquid is well 
known. As far back as 1915 [12] the maximum likelihood estimate, called the 
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most probable number (M.P.N.) by Biometricians, was suggested for the prob- 
lem, and is still being used today in Public Health for water, milk, and sewage 
analysis. This and other estimates have been studied by Fisher [13], Halvorson 
and Ziegler [14], and Matuszewski, Neyman, and Supinska [15]. 

The situation is the following. We are given a large volume V of a liquid con- 
taining a large number N of bacteria, and we are interested in estimating the 
bacterial density \ = N/V, the number of bacteria per unit volume. A sample 
of size @ unit volume is withdrawn and tested by some device such as placing 
the sample in a fermentation tube to see if any bacteria are present. It is as- 
sumed that each bacterium acts independently and that each has the same 
probability a/V of being in the sample. Thus the number of bacteria in the 
sample will be binomially distributed with probability a/V and size N; how- 
ever, if a/V is small and N is large the distribution may conveniently be re- 
placed by a Poisson with parameter Na/V = ad. The probability that no 
bacteria appear in the sample is then p = e “. If n independent samples of size 
a are withdrawn and tested, the number K of sterile samples will be binomially 
distributed with probability p and size n, and may be used to estimate the 
parameter \. However, the value of the experiment depends to a great extent 
on choosing a so that p = e * will be in a good estimating range, for if p is 
too small or too close to one, one will obtain too many fertile or too many sterile 
samples to be able to estimate \ with much accuracy. And since \ is unknown 
it will usually be impossible to choose a so that e™ will be moderately be- 
tween zero and one. So one usually takes several sizes of sample volumes a; , 


a2,°-:, a, called dilution levels, and numbers of samples n;, m2, --- , nm, at 
A oa r ‘ . 

each of the levels, with the hope that at least one of the e *** will be in a good 

estimating range. Then the numbers k, , ko, --- , k, , of sterile samples at each 


of the levels will be used to estimate X. 

The most frequently used B.A.N. estimate of the bacterial density seems to 
be the maximum likelihood estimate, since the minimum ’ estimates appear 
to be much more difficult to compute. The maximum likelihood estimate of \ 
is that value of \ which is a root of the equation 


(4.1) > (ni ad ida = > N;aQ;. 


= (1 — ar) inl 


Methods of solving this equation have been discussed by Halvorson and 
Zeigler [14], Barkworth and Irwin [16], and Finney [17]. Tables of the estimate 
for certain situations may be found in Halvorson and Zeigler and in Hoskins 
[18]. 

An application of the methods of the previous section will yield a B.A.N. 
estimate which is slightly easier to compute. Linear forms which lead to B.A.N. 
estimates are of the type 


s 


(4.2) Dd ni filz, MN (2, — &%) 


i=] 
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where z; represents the frequency of sterile tubes at the ith level of dilution, 


z, = k;/n;, and f,(z, \) converges in probability to a;(1 — e~*’)”’, z represent- 
ing the vector (z,, ---+ , z.). Equation (4.2) with fi(z, \) always equal to 
adil — ¢ **)" 


is equivalent to the maximum likelihood equation (4.1). 

We would like to replace f;(z, \) in equation (4.2) completely by an estimate, 
that is, f(z, 4) = a;/(1 — 2,), but we must take care of the cases in which 2; 
is equal to one. So we may choose f(z, \) = a@i/(1 — 2;) if z; # 1 and 


fiz, d) = a (1 a gy 


if z; = 1. This will lead to a B.A.N. estimate since eventually as the n; get 
large without bound, all the z; will be different from one. We have the equation 
(4.3) _ ny a e*) + b> nia; = 0. 

2541 l— 2; zj=1 


Written in simpler form, this equation becomes 


(4.4) , 2 n; a ; Nn; .. ait b> N; aj. 
agyt «60UcLLlL = set il — @, z;=1 

This equation is simpler to solve than equation (4.1) in that it only requires 
tables of e * which are readily available, while equation (4.1) requires for its 
solution the computation of (1 — s,s separately for each 7 or tables of 
(e’ — 1)” or (1 — e 7%)". The method by which it is suggested that (4.4) be 
solved is the same as that suggested by other authors in connection with the 
solution of (4.1), and that is Newton’s method. For a function f(x) with a con- 
tinuous first derivative, if zo is taken to be the initial guess at the solution of 
f(x) = 0, x, is defined inductively by 


(4.5) 7 f(a) 


ee ee ee 


Applying this procedure to equation (4.4), we obtain the inductive formula 


M55 adn _ Mia 
(4.6) An = Ana + i l= 2, . 2, 1-2 x — 


2 
nN; a; : @An-1 


sgt l — 2; 

The author has made a numerical study of the small sample properties of 
this estimate, the minimum x’ estimate and the maximum likelihood estimate, 
which he hopes to publish at a later date. An indication is given in this study 
that in general the estimate given by equation (4.4) has slightly better small 
sample properties in the sense of bias and root mean square error, than either 
the maximum likelihood or the minimum x’ estimate. 

In conclusion, I would like to express my thanks to Professor L. Le Cam for 
his generous advice and helpful discussions concerning this paper. 
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FAMILIES OF DESIGNS FOR TWO SUCCESSIVE EXPERIMENTS 


By G. H. FREEMAN 
East Malling Research Station, Maidstone, England 


It is sometimes desirable, particularly in experimentation with perennial 
plants, to conduct an experiment on plots already used for a previous trial. 
Various designs are known that facilitate this process (Hoblyn et al., 1954), the 
following notation being used to describe types of design. The letters O, T and P 
refer respectively to designs that are orthogonal, totally balanced—-i.e., balanced 
incomplete blocks—-and partially balanced; then a design of type X: YZ where 
X, Y or Z may be any of O, T or P, is one in which the arrangement of the first 
set of treatments with respect to blocks is of type X, that of the second set of 
treatments to blocks is of type Y and that of the second set of treatments to the 
first is of type Z. It is assumed that the two sets of treatments are non-interacting 
and that designs of type T or P may be extended, i.e., have complete replicates 
in each block. Then, if the first trial is in randomised blocks, type O, the only 
design that has not previously been very fully discussed is type O:PP for which, 
however, general methods of analysis have been given (Freeman, 1957b). The 
purpose of this paper is to describe all known families of O: PP designs with two 
associate-classes. These, being designs with two orthogonal constraints, can also 
be regarded as row and column designs, and will henceforth be considered as such 
here. 

The families of O: PP designs described here include all those with any mem- 
bers likely to be of much practical use, i.e., having more than two replicates or 
treatments, not more than 30 replicates, treatments, rows or columns and not 
more than 150 plots in all. The possibilities of existence of all O: PP designs within 
these limits have been investigated by enumeration and all the tabulated designs 
have been found to exist. Where larger designs are required their existence can 
usually be readily determined and, particularly where the number of replicates 
greatly exceeds the number of treatments, there may be many possible designs. 
A catalogue of the designs in Tables II-V and VII has been prepared, and is 
available at East Malling Research Station; the construction of an individual 
design gives rise to no practical difficulty by trial and error, but no attempt has 
been made to find transformation sets for each design. 

Since, for practical purposes, O: PP designs are constrained to have the same 
associate-classes with respect to rows and columns their classification depends 
on that of designs of type P. These, partially balanced, designs have been de- 
scribed in great detail by Bose and his co-workers, who have provided an ex- 
tensive catalogue of such designs with two associate-classes, (Bose et al., 1954). 
Although this catalogue is now known not to be exhaustive (see, for example, 
Archbold and Johnson, 1956 and Freeman, 1957a) it does provide a basis for the 
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classification of designs of type P and is thus adopted here for O: PP designs as 
well. 


Group DIvIsiBLE DESIGNS 


The simplest O:PP designs are those that are group divisible, the property 
of group divisibility being such that designs group divisible one way are also 
group divisible the other. Group divisible partially balanced designs can be di- 
vided into the three types, singular, semi-regular and regular, of which the first 
and last will be denoted as S' and R respectively, as usual, while semi-regular 
designs will be called H, so as to keep a one-letter code. As each type of group 
divisible design can be combined with itself or each other to make an O:PP 
design there are thus six types of group divisible O:PP design, these being de- 
scribed, in an obvious notation, as SS, HH, HS, RR, RH, RS, and considered in 
this order. 

To classify group divisible O:PP designs into families it is first necessary to 
consider the various types of singular, semi-regular and regular designs. 


Singular designs. Families of singular designs are uniquely determined by 
balanced incomplete block designs (Bose and Connor, 1952), and thus a com- 
plete classification of the former is afforded by a corresponding one of the latter. 
The following types of balanced incomplete block design are considered, these 
not including every possible design but containing all that give rise to singular 
designs from which can be constructed O:PP designs of practicable size. We 
shall use the notation C[n; k] for the binomial coefficient (%) 


(i) Clv — 1;k — 1] replicates of v treatments on C[v; k] blocks of k plots each 
(unreduced) 

(ii) (s + 1) replicates of s° treatments on s(s + 1) blocks of s plots each 
(orthogonal series 1 or OS 1) 

(iii) (s’> — 1) replicates of s° treatments on s(s + 1) blocks of s(s — 1) plots 
each (complement of OS 1) 

(iv) (2¢ + 1) replicates of (2¢ + 2) treatments on (4t + 2) blocks of (t + 1) 
plots each 

(v) (2t + 1) replicates of (4¢ + 3) treatments on (4t + 3) blocks of (2¢ + 1) 
plots each 

(vi) (2t + 2) replicates of (4t + 3) treatments on (4t + 3) blocks of (2¢ + 2) 
plots each. 


The first of these types exists for all values of v and all k < v, types (ii) and 
(iii) for all values of s for which complete sets of orthogonal Latin squares exist, 
and types (iv)—(vi) when (4¢ + 3) is a prime-power (Bose, 1939); Bose also shows 
geometrically that, though (4t + 3) is not a prime-power for t = 3, designs of 


1 The letters S here and T for triangular designs used later in the paper are also used 
respectively for designs with supplemented balance and total balance by Hoblyn et al. 
(1954), but there should be no confusion between the uses for types of balanced designs and 
families of partially balanced designs. 
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types (iv)—(vi) are possible for ¢ = 3. Types (v) and (vi) include the second orthog- 
onal series (OS 2) and its complement, respectively, when t = 1, the correspond- 
ing value of s in the orthogonal series designs being s = 2. 

The classification of balanced incomplete block designs into these six types is 
not mutually exclusive; for example, the design with 3 replicates of 4 treatments 
on 6 blocks of 2 plots each can be considered as belonging to each of types (i), 
(ii), (iii) and (iv). It follows that particular O: PP designs singular one way may 
belong to more than one family. For the sake of uniqueness, any one design of 
type P or O: PP will be regarded as a member of only one family, since the over- 
lapping of families is a direct consequence of the overlap of types of balanced 
incomplete block design and thus irrelevant to the consideration of O: PP designs. 

A balanced incomplete block design with r* replicates of v* treatments on b* 
blocks of k* plots each gives an unextended singular P design with r* replicates 
of nv* treatments on b* blocks with nk* plots each, there being v* groups of n 
treatments each, n > 1. Thus, on allowing extended designs with p complete rep- 
licates of treatments in each block and, further, the whole design repeated g 
times, the most general singular P has the following numbers of replicates, treat- 
ments, blocks and plots per block: q(r* + pb*), nv*, gb*, n(k* + pv*). The param- 
eters of all designs of type P will be written in this order henceforth; m will be 
used instead of v* so as to consider m groups of treatments in conformity with the 
usual notation. 

The types of balanced incomplete block design enumerated above give rise to 
the following families of singular designs, where the inequalities are inserted, 
in S(ii)-(vi), to ensure the uniqueness of the families: 
q(pm + k) 

m 


S(i) C{m; k], mn, gC|m;k], n(pm + k) (k < m) 


S(ii) q(s + 1)(ps + 1), ns’, gs(s + 1), ns(ps + 1) (s > 2) 
S(iii) q(s + 1) [s(p + 1) — 1], ns’, gs(s + 1), ns[s(p + 1) — 1] o>? 
S(iv) g(2t + 1)(2p + 1), 2n(t + 1), 2¢(2t + 1), n(t+ 1Qp+1) (t> 1) 
S(v) g[2t(2p + 1) + 3p + 1], n(4t + 3), 


to 
— 


q(4t + 3), n[2t(2p + 1) + 3p 4+ 1) (t > 0) 
S(vi) g[2t(2p + 1) + 3p + 2], n(4t + 3), 


q(4t + 3), n[2t¢(2p + 1) + 3p + 2] (t > 0) 


Semi-regular designs. Bose et al. (1953) classify semi-regular designs according 
as \; does or does not equal zero, but this classification seems unnecessary for 
the present purpose, A, = 0 being merely a special case. Thus, from Bose et al., 
the design has na2/e replicates of mn treatments in m groups of n on n°d2/c 
blocks of me plots, whereA; = nA2(e — 1)/e(n — 1) isintegral and m S (nd: — ©’), 


c(n — 1). The extension of this design to allow for p complete replicates of the 
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treatments in each block and the whole design repeated q times gives the follow- 
ing family of semi-regular designs: 


roe — 1) 


gndo(pn + c) qn’ re 
H = = _mn, 1 =, m(pn + c), where n - 
c Cc c(n — 1) 


is integral and 


nr. —- Cc 


ms — 
c’(n — 1) 


Regular designs. ‘These are more difficult to categorise than either of the other 
types of group divisible design. Further, practicable O:PP designs cannot be 
derived from all types of regular design; two such types are those generated by 
the methods of differences and of omitting varieties (Bose et al., 1953), and thus 
these two types are not considered further. 

The types of regular design that are considered are as follows: 

(i) designs derivable by addition, 

(ii) designs with complete and incomplete groups, 

(iii) designs with groups arranged in sets, 

(iv) disconnected designs. The first of these consists of all designs derivable 
by addition of group divisible designs to other group divisible designs or to 
balanced incomplete block designs, while the next two types are described else- 
where (Freeman, 1957a). The fourth type is not considered in an unextended 
form, as it has been shown (Freeman, 1957¢) that this type of design cannot give 
rise to an O:PP design; however, extended disconnected designs are of use for 
the construction of O: PP designs. 

In order to consider only those designs of type P that give rise to O: PP designs 
when the plots within each block are rearranged in accordance with a second 
classification further restrictions on the parameters are necessary in types 
R(ii) and R(iv). With these further restrictions, the four families of regular de- 
signs are as follows, there being p complete replicates of the treatments in each 
block and the whole design being repeated q times: 


R(i) gR(k + pmn) qRmn 


i mn, > k + pmn, 


where 

RK=o +er,a7,0,7 >0,4 > 1, 

A(n — 1) + Am(m — 1) = r(k — 1), AX(n — 1) + Asn(m — 1) = r'(k — I), 
ary; + a’h; = A, ¥ ake + a’ds Ao. 


gC(m — 1; ulC[n; hAl(nu + h + pmn) 


i 


R (ii) mn, 


qmC|m — 1; ulCl[n;hl,nu + h + pmn, 
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whereO0 << u<ml<h<n-—l. 


R (ii) q(3n — 1)(pn + 1), 2n®, gn(3n — 1), 2n(pn + 1). 
qr(k mn qrmn 
R (iv) d 7 ) mn, d os k + pmn, 


where r(k — 1)/(n 1) is integral, p >O, 1 << k <n — 1. 
There are m groups of n treatments in each design, where, for family R(iii), 


m 2n. 


Construction of O:PP designs. Not every pair of designs of type P can give 
rise to an O:PP design. Thus, families S(ii) and S(iii) are incompatible with 
S(v) and S(vi), as s° is never of the form (4t + 3). Also R(iii) is incompatible 


with Ri) and R(iv) as, in order to satisfy the relations gn(38n — 1) = 
nu +h + pmn or qn(3n — 1) k + pmn, h or k must be a multiple of n, an 
impossibility since each lies between 1 and (n — 1). Further, S(iv) and R(iii) re- 


quire an even number of groups of treatments and so cannot be associated with 
S(v) and S(vi), which require an odd number. 

All the families of group divisible O:PP designs not excluded by the above 
argument are given in Table 1 together with their derivation from the corre- 
sponding families of P designs. As an example, consider family SS II, derived 
from families SG) and S(i). For the numbers of groups of treatments to be the 
sume in the two designs the relation m = s° must be satisfied, while for the blocks 
and plots per block of the two families to be interchangeable two further rela- 
tions must hold. Throughout Table I, to distinguish between the parameters in 
the two families of type P, that written second has dashes, and so here p’ and 
y’ refer to S(i) while p and q refer to S(ii). The relations between blocks and 
plots per block then are: qs(s + 1) = n(p's' + k), q’C[s°; k] = ns(ps + 1). Thus 

n(p's° + k) ’ ns(ps + 1) 
q= andq = —— 
s(s + 1) C{s*; k] 

as shown in Table I. The number of treatments in the design is ns’, while the last 
two columns of Table I show the numbers of blocks in the two designs of type 
P, i.e., qs(s + 1) for S(ii) and q/C{s*; k] for S(i). The number of replicates is 
shown in Table I as g(s + 1)(ps + 1), that corresponding to S(ii), although it 
could be given in several forms; by convention, the number is given throughout 
Table I in terms of the replicates of the family of type P given first. 

In Table 1, all the numbers shown are non-negative integers and are subject 
to the restrictions described in the classification above of designs of type P. 
In certain families further restrictions are necessary on the values of the param- 
eters by virtue of the first of the non-existence theorems (Freeman, 1957¢); thus 
in any family constructed from S(i) the number of treatments in the O:PP de- 
sign must be greater than or equal to the number of rows plus the number of 
columns, while in family SS IIT p and p’ cannot both be zero. In families SS VIII 
and SS EX, in order to satisfy the relation s’> = 2(¢ + 1), the parameter w is intro- 
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duced, where s = 2w and so (t + 1) = 2w’; similarly, in RS XIV and RS XV 
° 2 ‘ ° ° ° . ‘ ‘ 2 rp 

to satisfy s = 2n, w is again introduced, with s = 2w and n = 2w’. These uses of 

w are the only occasions where auxiliary symbols are necessary in this Table. 


Useful group divisible designs. Tables II-V give the numerical values of the 
parameters of all useful group divisible designs. Where there is a design with 
the same parameters but simpler than O:PP the O:PP design is not included in 
the appropriate Table. In these Tables, which are derived by putting numerical 
values in Table I, ‘‘rows’’ and ‘‘columns”’ are substituted for the two types of 
“blocks” of the basic designs of type P. Rows and columns are chosen not to 
correspond to either type of block but merely so that there are never more rows 
than columns, this convention being a help to the writing down of the designs 
themselves. 

It will be seen that by far the greatest number of useful designs occurs in the 
three families SS I, HS I and RS I, the designs from which are enumerated in 
Tables I, IL] and IV respectively. As no other family has more than five useful 
designs, the useful designs in all other families are given in Table V, in which 
many of the parameters are inapplicable in each line. 

Only ten families are represented in Table V, and so Table VI has been con- 
structed from Table I showing the smallest designs that are theoretically pos- 
sible in the other 43 families that have no useful designs. Since even the smallest 
of these, those for SS Il and RR VI, have 216 plots, while the largest, that for 
RS XII, has 8744736 plots, no attempt has been made to discover whether these 
designs do in fact exist. None of them appears to be excluded by any of the non- 
existence theorems (I’reeman, 1957c), and so they are all presumed to be pos- 
sible; however none is likely to be practicable. Table VI, like Table V, has many 
parameters inapplicable in each line. 

In Tables IV, V and VI, whenever designs derived from the P design R(i) 
are given, the auxiliary parameters A; and A, are shown. This is because two de- 
signs, like RS 1 1 and RS 1 2 in Table IV, may differfrom each other only in respect 
of these parameters. No other auxiliary parameters are necessary, as these are 
relevant only to the construction of the design, not its final form. 


Non-Grovup DrvisisteE DEsIGns 


Like designs of type P, O: PP designs that are group divisible are much more 
numerous than those that are not. Of the other types of P design, only the tri- 
angular and Latin square appear to give rise to O:PP designs; while the other 
types may, none of the designs given by Bose et al. (1954) leads to an O: PP de- 
sign and, at least for cyclic designs, it appears unlikely that any could. Since in 
an O:PP design the association scheme is the same both ways designs that are 
either triangular or Latin square one way are the same the other, the association 
schemes being unique for each type. 


Triangular designs. The basic triangular design has r replicates of n(n — 1)/2 
treatments on rn(n — 1)/2k blocks of k plots each. If n = 2 the design has only 
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TABLE V 


Other useful group divisible designs 


Design sitim ini kik’|c j|As\c’|¥3| R | Ar} As |R’ Af AS) ul hi! r ip p’iq a’ o Tr to 

=< aio 

SSVI 1 9 2 “| : -' 0,011; 8} 18 | 12] 12 
88 VII 1 2 6\/2/5 -|001, 1) &j|12 6 10 
SSVII 2 g/2/7 00} 1/1) 7/16] 8/14 
HH 1 2| 4 2/3/|2 -/1)1) 1/1518) 8| 12] 12 
HH 2 -{| 4) 81 « 113/41 11/1) 1/18! 8} 12] 12 
HSV 1 1; 7/2 1/5 011110! 14) 7. 20 
HSVI 1/-/1! 7/2 1/2 ool 111) 4/14] 7] 8 
HS VI 2 2/11 /2 1 : 0011 6/22/11! 12 
RRI 1 2\2\2/2|- bi ¥) S181 sts -|2\ 2) 111) 25! 4| 10! 10 
RRI 2 2\2!2\2 s| 1] 2/5 oc -| 2) 2/1/1125] 4/10] 10 
RRI 3 2)2/\2/2|- “1 a) oP ater eis -| 2) 2) 111128! 4! 10) 10 
RHI 1 213/3 1} 1 7| 412 1} 2) 1) 1' 21 | 6| 9 | 14 
RHI 2 2 2/4 s| 5| 2 12}11 2! 6! 9/16 
RH I ; $24 1i1 14) 10 ‘ 03:11 14 6 4 21 
RH I ‘ s 214 14 Is 100 11 0 41/1/18) 6) 4) 27 
RHI 5 pial 1/1 IS 14 10 0 4/1, 1)18| 6| 4) 27 
RS VI 1 2)4/1 1) 2 0 11,3; 9| 8] 6/12 
RS XIII 1! (;2/)1 O11 1 & 8 4 10 
RS XIII 2 42/1 1) 1) 1) 3115 | 8 | 10 | 12 
RS XVII 1 blaj2i1 11615 15| 8/10) 12 
one treatment, if n = 3 the design is one in balanced incomplete blocks and if 
n = 4it may be regarded as a group divisible design with 3 groups of 2 treatments 


each; hence for practical purposes n > 4. Further, in addition to the possibilities 
of extension by adding p complete replicates to each block and repeating the 
whole design, a times say, complete replicates of balanced incomplete block 
designs may be added. Thus, in fully extended form, the basic triangular design 
is as follows: 


, q2k + pn(n — 1)] n(n — 1) qn(n — 1) 2k + pn(n — 1) 
2k Ve - : a. 
where q ar + AR, 


ants _.. 
ph iat po a eS 
(n — 1)(n — 2) 


Thus the triangular O:PP design has the following form: 


rr ql2k + pn(n — 1) n(n — 1) qn(n — 1) qn(n — 1) 
; ha! Ee le lh Ue! 
where 


_ k{2k + p'n(n — 1)| 


n> 4 
n(n — 1) 


, k’[2k + pn(n — 1)] 


if 
{ n(n — 1) 


and there are the furiher restrictions above on g and similarly on q’. 
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TABLE VI 
Smallest possible designs in families with no useful members 


7 | ! 
Family s\w 





t\m n |kik’}c}| Ae | R jAi] At julhju’/h’iri ey} p |p’ | qaiq Rep. | Tr. IR ws | s 

SS II 8 |- |- 9 3 18 - -|-|- - 0; 0 2 1 8 27 | 9 24 
SS III 3i-|-| 9] 2 - -| -|- 233s 32 1g} 24 24 
SS IV si-i-| 9] 3\s 0} o| 2] 2 16] 22] 86] 24 
SS V eli] 0) SEE ss o} 1/2] 1) 16] as} 12] 24 
SS VIII |-|2|-| 16) 3 2} 1| 2] 6] 150| 48! 60] 120 
SS IX -|2|-| 16] 5 | oi 84 Si 8 30 80; 40] 60 
ss X -|-]2} 6] 2\|-|-] - | si sis 75 12] 30 30 
Ss XI -l-|a] 7] 7]2 - \- | 11! o, 1110! 10] 49! 71 7 
SS XII -|- |i} 7] 7 - |- | o}o} 3/3| o9| 49] a] 21 
SSXI-f- [1] 7] 7 ]1|- -| - - |- 1] 0] 1] 11 11 a9} 7] 77 
ssXIV- f-|- jt} 7] 7} - 0; 0] 3] 4] 12] 49] a] 28 
SS XV -|-l1} 7] 7 -| - - | -|-|-| | oO} 4] 4] 16] 49] 28) 28 
HS II ci} es 1} 4] -| -FFF a5 41 843) 21s) ei Ss 
HS III 3 |- 9] 3 1] 5 1} 1] 1] 3] 6of 27] 36] 45 
HS IV -|-|2| 6| 3 2] 12 it 23373 45 18} 27| 30 
RR II - 2; 4/4 11 |7 3 1 \2 St 1 l 1 33 sj 12] 2 
RR III - 2; 9 1 i {3 18 | 18 2 2 272; 18 336 | 336 
RR IV 4| 2/2 9/3| 1 cl ete 4a 5 | 8| 10] 36 
RRV - 4} 2 . 2} 2] 2] 2] 80 8| 20] 20 
RR VI |- 2| 4(|2/4 9 5 3 3 oe | 1 27 8| 12 18 
RR VII - 2) 614 1 |3 5 3/3} 3] 1] 150 12} 40| 45 
RR VIII s/- |- 2) 6 \4 {3 |- | 5\5| 3! 1/ 1] 2] So} 12] 15 40 
RH II - 2| 6 31 5 1 2 hi oi si 4 50 12} 2) 30 
RH II -|/- 4} 2 1} 3 ie) cio) st ot eel ol el ss 
RH IV -|-l-| 3] 6 {2 31 5 5 ci #1. 443 50| 18 2} 45 
RS II 3\-|-| 9} 31]9 22 7 - -} 1] 7 3 3 | 264] 27 36 | 198 
RS III 3|-|-| 9] 3\9 17 \8| 5] 1} 5| 3| 3] 204] 27] 36] 153 
RS IV -\-|2} 6] 216 9|5| 4 2/1/11] 3 45 12 | is | 30 
RS V -|-]1| 7] 217 17 |6| 8 |-|- - |- ei st at i 17 14 7| 34 
RS VI ni ot] $F | 25 16 | 12 |- |- |- |- | ei St ahs 25 14 7| 50 
RS VIII 3 |- 9] 9 8 |3 |- |- | 1} 9] 1/13] 1456} 8st] 156| 756 
RS IX 3 9| 9 -| - 8 |3 -|-| 1/18] 2] 13 | 2912 81 | 156 | 1512 
RS X -|-|2] 6| 4 . 5 2 -| 2] 1] 1] 7] 105] 21 361] 70 
RS XI -|-|1] 7] 14 - - 16 |7 |- 0 |735 | 3 | 13 \66924 98 | 91 |72072 
RS XII l- |-|1 | 7] 14 - 6 |7 |- |- |- |- | © |980 | 4 | 13 [89232 | 98] 91 |96096 
RS XIV - |2/- | 16 8 - | 3117) 12} 20 | 6900) 128 400 | 2208 
RS XV 2 16 8 3 5 4 | 20 | 2300 128 400 736 
RS XVI 3] 8| 4 5| 5| 4/12] 924 32 | 168 | 176 
RS XVIII 3 9| 5|3 2 rl el sia 64 45 48 | 60 
RS XIX 3 9} 5 (3 2 ti 1%) 4 32 45| 30) 48 
RS XX 2} 6| 4/2 3 si tiz2is 75 24) 36) 50 
RS XXI 1 7 | 10 |7 3 1 0 1j il 33 70 | 30 77 
RS XXII 1} 7) 97 2j-| 1] O;] 2) 10 40 63 | 36 70 

TABLE VII 
Useful designs in family TT 

Design nk Y ria R|Al/ria’!|R’\A’| pj p’i aq q’ | Rep.| Tr. Rows | Cols 

TT 1 Sia] & |] 2) t] @1 @i @1 216} 21 aie! 2)| 2 12 | 10 5 | 24 
IT? 5 | 4 5 fi a) eo) eo] &! €] @1 Ot 8) @] 2] a 12 10 5 24 

rT 3 5 5 6 3 1 0 0 ; 1 0 0 0 0 3 3 3 10 5 6 

a 5| 5| 6 bi ot @] Oi Si £1 Ol @i £1 6) Si B 15 10 6 | 2% 

rT 5 5 5 6 H 1 0 0 3 5 0 0 2 0 15 15 10 6 | 2 

IT 6 §| 5 6 3 1; 0; 0| 6] 1j 9] 1 si 4 15 15 10 6 | 2 

ces 6 5 6 2 1 0 0 4 2 0 0 1 0 2 8 8 15 6 20 

rts 6 6 10 4 1 0 0 1 1 0 0 0 0 } 4 4 15 6 | 10 


| 
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The useful designs in family TT are given in Table VII, where it is necessary 
to include the auxiliary parameters r, a, R, A, r’, a’, R’, A’, to differentiate be- 
tween designs otherwise identical. It will be seen that in all of them the basic 
triangular designs are singly linked blocks (SLB), r = 2, k = n — 1, or their 
complement, r = n — 2,k = (n — 1)(n — 2)/2, one way and doubly linked 
blocks (DLB), r = a2 — 2,k = a, the other way. 

These values of r and k give rise to the following triangular designs: 


SLB q(pn + 2) n(n — 1) qn (pn + 2)(n — 1) 
Sil 9 rs 9 » 9” 9 : 
q(pn — 2 — 1) =. —2 — 1) 
Complement of SLB ie t 5 i: ,—@ te he 9 Mn 
DLB q(pn — p+ 2) n(n — 1) gin — 1) n(pn — p + 2) 
. - 9 ’ » ’ 9 ’ 9 


Thus, for an 0:PP design to be SLB both ways or SLB one way and its com- 
plement the other gn = (p’n + 2)(n — 1) or qn = (p’n + n — 2)(n — 1) re- 
spectively; it is easily seen that neither of these equations has any integral solu- 
tions for n > 2. If an O:PP design is the complement of SLB both ways then 
2qn = (p'n + n — 2)(n — 1)(m — 2). n has no factor in common with (n — 1) 
nor, if odd, with (n — 2); thus n/2 is integral, = x say, and qx = (p’x + x — 1) 
(2x — 1)(x — 1), which is impossible for xz > 1, i.e., for n > 2. For an O:PP 
design to be DLB both ways q(n — 1) = n(p’n — p’ + 2), which is impossible 
forn > 3. 

Thus, no O:PP designs can be SLB or its complement both ways, neither can 
they be DLB both ways, but there is no reason why designs of either kind should 
not fit with other triangular designs to make an O:PP design. 


Latin square designs. The basic Latin square design has r replicates of n? 
treatments on rn?/k blocks of k plots each but, as for the triangular design, it can 
be extended by adding on complete replicates to each block, repeating the whole 
design and adding on balanced incomplete blocks. Thus, in fully extended form, 
the basic Latin square design is: 


1 K+ pnt) 2 gr 


2! R(k — 1) 
ios % 


k + pn’, where g = ar + AR,A = —> 


n? — | 
Thus the Latin square O:PP design has the following parameters: 


|: . , 2 aan? (Je ca” : (Ke 2) 
LL ms ae ) nt, © a , where q = a x =‘ fs e 





n? n* 


and there are the further restrictions above on gq and similarly on q’. 


There are only two useful designs in the family LL. Both have n = 3, 
k= k’ =6,p = yp’ = Oand s0q=q = 4whileea =a =1lr=r = 4, 
A = A’ = R = R’ = 0, thus leading to designs with 4 replicates of 9 treatments 


on 6 rows and columns. The only difference between the designs is that in one 








1078 G. H. FREEMAN 


first associates concur three times and second associates twice in both rows and 
columns, while in the other first associates concur three times in rows and twice 
in columns and conversely for second associates. Thus, if the designs are LL 1 
and LL 2 respectively, then, using the notation previously adopted (lreeman, 
1957b), in LL 1 


Ai = pi = Bi d2 he = 2 and in LL2 At Me = >: Ao My 2 


Since, for LL 2, », = » = 30, in the same notation, the design has equal efficiency 
for both types of associates, and it is the only useful O:PP design with this 
property. 


Summary. All known families of O: PP designs with two associate-classes are 
classified, these including all with at least one member of practicable size, i.e., 
having more than two replicates or treatments, not more than 30 replicates, 
treatments, rows or columns, and not more than 150 plots in all. The designs 
within these limits are tabulated in their families and where a family has no 
practicable design its smallest member is given. 


Acknowledgements. | should like to express my thanks to Dr. N. L. Johnson of 
University College, London, for his advice throughout the preparation of this 
paper, particularly with regard to the tabulation of the designs. 
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MOST ECONOMICAL MULTIPLE-DECISION RULES' 


By Wm. Jackson HA. 
University of North Carolina 


0. Summary. This paper is concerned with non-sequential multiple-decision 
procedures for which the sample size is a minimum subject to either (1) lower 
bounds on the probabilities of making correct decisions or (2) upper bounds on 
the probabilities of making incorrect decisions. Such decision procedures are 
obtained by constructing artificial decision problems for which the minimax 
strategies provide solutions to problems (1) and (2). These are shown to be 
“likelihood ratio” and ‘‘unlikelihood ratio” decision rules, respectively. Thus, 
although problems (1) and (2) are formulated in the spirit of the classical Ney- 
man-Pearson approach to two-decision problems, minimax theory is used as a 
tool for their solution. 

Problems of both ‘‘simple’’ and “‘composite’’ discrimination are considered 
and some examples indicated. (Some multivariate examples are given in [4].) 
Various properties of the decision rules are derived, and relationships with works 
of Wald, Lindley, Rao and others are cited. 


1. Simple discrimination. 

A. Formulation of the problem. We are concerned with a sequence X,, X2, 

- , of real- or vector-valued, independent, and identically distributed random 
variables, each having a density function f, belonging to some specified class 
Q, w.r.t. a fixed measure ux. 

The decision problem is to formulate a rule for choosing a non-negative in- 
teger n (completely non-random), and, after taking an observation 


xe = (i. + ** | Ba) 
on X = (X,,---, X,), for choosing one of m possible alternative decisions 
A, ,--: , Am. A multiple decision rule (m-d.r.) for choosing among A, , 
- , A, on the basis of x is defined by an ordered set of non-negative, real- 
valued, measurable functions ¢(7) = [¢:(@), --+ , dm(x)] on the space X of x 


such that Vie = 1 identically in x (for n = 0, the ¢,’s are constants). A; is 
then chosen with probability ¢,(7) when x is observed. For non-randomized 
d.r.’s (all ¢,’s equal 0 or 1), the ¢,’s are characteristic functions of mutually 
exclusive and exhaustive “acceptance”’ regions R, ,--- , R» in X, where A, is 
accepted if x ¢ R;. 
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A subscript or superscript m denotes the corresponding sample size; f"(x) and 
uw are the joint density and product measure, respectively. 

We suppose throughout Section 1 that @ consists of a finite number, say J, 
of elements fi, --- , f: ; we say that the corresponding decision problem is one 
of “simple discrimination” and a d.r. is a d.r. for “discriminating among f, , «++ , 
f,.”’ Here, if » is non-atomic, only non-randomized d.r.’s need be considered [2]. 

A dr. D = D, is characterized by the functions 


pij(D) = Pr(D chooses A ;| fi) = | d(x) fi (x) du" (i = 1,---,j7 =1,---,m). 
x 


We consider two different criteria for choosing a d.r. for simple discrimination. 
The first assumes that / = m and that the decision A; is to be preferred when 
fis true. Denote p;(D) = p(D) = 1 — q(D), so that p; is the probability of a 
“correct” decision and g; the probability of an “incorrect”? decision when f; is 
true. 

DEFINITION 1. Let a = (a1, -°-- , @m) be a given vector of positive constants 
each less than one. A d.r. Dy , based on a sample of size N, is said to be a most 
economical m-decision rule relative to the vector @ for discriminating among 
fi, +++, Im if it satisfies 


(1) p(D) = a; (i = 1,--- ,m) 


and if N is the least integer n for which (1) may be satisfied by some m-d.r. D,, 
based on a sample of size n. N is said to be the most economical sample size. 

We now no longer require that 1 = m, but suppose that corresponding to each 
f; one or more of the alternatives A; is preferable, or ‘correct,’ when f; is true. 

DEFINITION 2. Let 8 = (8;;) be a given 1 X m matrix of positive constants 
such that for every 7, 7 pair for which A; is a correct decision when f; is true 
8; = 1. Ad.r. Dy , based on a sample of size N, is said to be a most economical 
m-decision rule relative to the matrix 8 for discriminating among fi, --- , f; 
if it satisfies 


(2) pi(D) SB C= 1,---,1;7 = 1,---,m) 


and if N is the least integer n for which (2) may be satisfied by some m-d.r. D, 
based on a sample of size n. N is said to be the most economical sample size. 

If 1 = m and 4; is preferred when f; is true, then an M.E. d.r. relative to 8 
also controls the probabilities of correct decisions if Dini Bij <1 for all 7. 

If 1 = m = 2, both (1) and (2) reduce to upper bounds on the probabilities 
of the two kinds of error, and Definitions 1 and 2 define an M.E. 2-d.r. as one 
with minimum sample size subject to these bounds. 

It is intuitively clear (and elementary to prove) that a necessary and suffi- 
cient condition for the existence of a M.E. m-d.r. relative to any a or B (l = m) 
is that there exist uniformly consistent sequences of 2-d.r.’s for discriminating 
between every pair w; , w;(¢ # j) (5). 

We shall utilize elements of Wald’s theory of decision functions as given in 
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(0, 0) (1,0) Py 
Fic. 1 


[14], and shall use in particular some of the results of Sections 3.5 and 5.1.1, 
altering his notation slightly. The differences in the “data of the decision prob- 
lem’? assumed by Wald and here are only minor. 

Let D, denote the class of all m-d.r.’s based on a sample of sizen (n = 0, 1, 2, 
--+), Clearly, for n Ss N, D, © Dy ; Lemma 1 follows almost immediately. 


Lemma 1. For every fixed sample size n = 0,1, 2, --- , let D, be a minimaz d.r. 
and denote r, = max; r(fi, D°.), where r(f;, Dn) is the risk w.r.t. some bounded 
loss function. Then the sequence {r,}, n = 0, 1, 2,--- , is @ non-increasing se- 
quence. 


B. Most economical decision rules relative to a vector a. Later in this section, we 
shall apply Wald’s theory to two specific loss functions and develop in each case a 
method of obtaining M.E. d.r.’s as defined by Definition 1. First, we motivate 
geometrically the selection of such loss functions so as to identify the minimax 
strategy with the desired one. This alternative approach may give some geo- 
metrical insight into the properties of the d.r.’s obtained.’ 

For fixed n, let p(D) = (p,(D), --+ , pm(D)) denote a point in m-space, and 
P,, = {p(D): D € D,}. It can be shown (([2], [10]) that P, is a convex body in 
the unit m-cube U containing all corners of U with coordinates summing to 
unity. The case m = 2 is illustrated. (Conditions under which P, is a proper 
subset of P,, for n < n’ and for which P,, tends with increasing n to U are given 


elsewhere (({3], [5]}).) 


2 The author is indebted to the referee for considerable improvement of this geometric 
presentation. 
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In the diagram, the point ag P, ; therefore n is smaller than the required 
sample size. The M.E. sample size is the smallest n for which a ¢ P, , in which 
case the points p’, p” and a@ coincide (approximately). To test whether or not 
a € P,, ,wecan examine the position of the points p’ or p” relative to the position 
of a. 

The points on the “upper” surface of P, (n fixed) include all points p(D*) 
corresponding to Bayes strategies D* when the loss function is 


(3) W(fi,A;) = Wa = —8;;/e; (i,j = 1,---,m), 


where 6,;; denotes the Kronecker 6-function.” Then the risk w.r.t. W,,; is 
m 


(4) r(fi,D)= —D 6;; pi,(D)/a; = —p(D)/a; (¢ = 1,---,m). 


j=1 


If p’ is not on the boundary of U, the least favorable distribution for the weight 
function W;; will positively weigh each element in Q. (This will occur if the region 
in & of positive density is constant over @.) In this case, the minimax strategy 
dD’. will be such that p(D.,) is on the line LZ, of constant risk; i.e., p(D.,) = 9’. 
(To obtain the minimax geometry with the loss function W,,;, transform the 
diagram by dividing the 7th coordinate by —a; ; then the convex body P,, goes 
into the convex body of risk points (1, +--+, Tm), Tr: = r(fi, D).) 

Alternatively, the ‘‘upper”’ surface of P,, corresponds to Bayes strategies when 
the loss function is 


(5) W*(f;, Aj) = Wi = (1 — 4/01 — a) (i,j = 1,2,---,m) 


» 2; 
and the risk function is r*(f;, D) = q:(D)/8;, where 8; = 1 — a;. The least 
favorable distribution will likewise positively weight each element of 2 when- 
ever p” is not on the boundary of U. In this case, the minimax strategy D*. will 
be such that p(D",) is on the line Le of constant risk; i.e., p(D’,) = pp”. (To 
obtain the minimax geometry with W?,, transform the diagram by dividing 
the 7th coordinate by 1 — a; , again transforming P,, into the convex body of risk 
points.) This latter approach is similar to that used by Rao [11] for problems of 
classification in multivariate analysis. 

When / = m > 2, there is an added complication for the latter loss function 
since the line (Ze) from (1, 1, --- , 1) through @ need not necessarily pierce P? 
for n < N, the M.E. sample size. (Of course, if a ¢ P, , then the line certainly 
pierces P,.) Thus the components of a least favorable distribution are not 
necessarily positive unless n = N and p” is in the interior of U’. 

Thus, in one instance, minimax rules maximize the common ratio pi/a@ = 

= Pm/Gm and, in the other, minimize the common ratio q;/8; = --:= 
dm/Bm. The M.E. sample size is the smallest one for which the common ratio 
is 21 or <1, respectively. We now formalize these results. (Wald’s Theorem 
5.3* asserts the existence of a minimax d.r. D® for any (fixed) sample size.) 





3 This loss function satisfies Wald’s requirements although it is not necessarily zero when 
a correct decision is made nor necessarily positive otherwise, as intuitively suggested, but 
never required mathematically, by Wald. 

* All references to Wald refer to [14] unless otherwise specified. 
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rp ’ J 0 — 7 
THEOREM |. For eachn = 0,1, 2, --- , let D,, be a minimax d.r. w.r.t. the weight 
function (3) for samples of fixed size n. Suppose for some n, 


(6) max r(f;, Di) S -1 


and let N be the least such integer. Then D3, is an M.E. d.r. relative to the vector a 
for discriminating among f,,--- , fm. Conversely, if there exists an M.E. d.r. 
relative to a for discriminating among fi, --- , fm, and the M.E. sample size is 
N, then Dy is an M.E. d.r. 

Proor. From (4) and (6), it follows that Df, satisfies (1). Now suppose for some 
n < N, there exists a d.r. D, satisfying (1). Since D’, is minimax, max; r(f;, D.)< 
max, r(f;, D,) = max; [—p(D,)/a;]. Since D, satisfies (1), we have from above 
that max; r(f; , D’.) < —1, in contradiction to the fact that N is the least in- 
teger n for which this is true. Hence, Dy is an M.E. d.r. 

To prove the converse, suppose Dy is an M.E. d.r. Then 

—1 = max; [—p,(Dy)/a)] = max; r(f;, Dy) 2 max. r(f;, DX) 

since Dy is a minimax dr. Hence, (6) is satisfied for n = N, and since N is the 
M.E. sample size, Dy is an M.E. dor. 

Lemma | assures us that any n for which (6) is violated is too small. Now let 
us consider the structure of minimax d.r.’s for a fixed scmple size n. 

Derinition 3. A dur. D defined by $(x) is said to be a likelihood ratio d.r. if 


there exist positive constants a, , --- , @, such that for any j and any z for which 
o(r) > 0, ajf}(x) = afi(x) for all t # 7. 

(Note that a,,---, @, determine @ completely except in sets of x for which 
af;(x) = max; a,f}(x) for more than one value of 7.) Setting a; = £;/a; , where 
f (¢:,-°* , &m) is an a priori distribution over Q = (f,,-:-, fm), it follows 
from Wald’s Theorem 5.1 (with (5.6) replaced by (5.7)) that a Bayes d.r. rela- 


tive to any & for which all £; > 0 is a likelihood ratio d.r., and conversely. 

Wald’s Theorem 5.3 asserts the existence of a minimax d.r. and a least favor- 
able distribution, and that any minimax dr. is a Bayes d.r. relative to any least 
favorable distribution. Moreover, it follows from (4) and Wald’s Theorem 5.3 
(iii) that if all components of a least favorable distribution are positive, any 
minimax d.r. D® has the property: 


(7) Pi(D°)/a, = +--+ = pm(D°)/am. 
We shall give sufficient conditions for this to be true. 

Assumption 1. If R is a subset of & for which fx f?(x) du" = 0 for some i, 
then fe f?(x) du" = 0 for all values of 7. (Whenever this assumption is made, we 
shall tacitly assume that % is redefined so that f?7(x) > 0 for all 7 and x ¢ &.) 
We state a theorem analogous to Wald’s Theorem 5.4;° the proof (not given) 
is also analogous. 


5 It might be noted that Wald’s condition (iii) of Theorem 5.4 is superfluous since it is 
always fulfilled; e.g., in Wald’s notation, let 6 = 1/u (i = 1, ---, u) identically in z, and 
then r(F; , 5) = (u 1)/u < 1 forz = 1, ---,k. 
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THEOREM 2. Jf Assumption 1 holds, all components of a least favorable distribu- 
tion & w.r.t. the weight function w,; are positive. 

Hence, under Assumption 1, an M.E. d.r. may be obtained by the following 
method: for each sample size n, find a likelihood ratio d.r. D‘, for the constants 
a, °** , dm determined by Eqs. (7), and then choose N as the minimum n for 
which pi(D'.) =a. 

As an alternative approach, we can consider the weight function W; and 
proceed analogously to the first approach, giving a theorem identical to Theorem 
1 with (6) replaced by max; r(f;, D},) S 1; and, replacing a; = £;/a; by £;/8;, 
it follows analogously that a Bayes d.r. relative to any & for which all £; > 0 
is a likelihood ratio d.r., and conversely. Moreover, if all components of a least 
favorable distribution are positive, any minimax.d.r. D’ has the property: 


(8) qi(D’)/p, a A ae dm(D"), Ba . 


We shall give sufficient conditions for this to be true. Analogously to Wald’s 
Theorem 5.4, we have: 
Lemma 2. If Assumption 1 holds, and tf there exists some d.r. D for which 


r(fi, D) < 1/maxye< jem Bj (¢ = 1,---,m), 


then all components of a least favorable distribution are positive. 
The following lemma may be useful in this regard: 

Lemma 3. If 6; < [1/(m — 1)] > 07-1 B; (t.e., a: > i a; — 1]/[m — 1}) 
for all i, then there exists a d.r. D for which r(f;, D) < 1/max; 8; for all 7. 
The proof follows by considering a d.r. defined by ¢:(2@) = 1 — (m — 1)8;/ 
= 8; > O identically in x (¢ = 1, --- , m). 

THEOREM 3. Suppose Assumption 1 holds. For any sample size greater than or 
equal to the M.E. sample size, all components of a least favorable distribution are 
positive. 

Proor. Suppose n = N, the M.E. sample size, and that D?, is a minimax d.r. 
for samples of size n; then, using Lemma 1 and the theorem analogous to Theo- 
rem 1, D>. satisfies (1). Use of Lemma 2 completes the proof. 

Hence, under Assumption 1, Dy is a likelihood ratio d.r., and an M.E. d.r. may 
be obtained by considering likelihood ratio d.r.’s D? for each n for constants 
a, , *** , dm determined by (8), and then choosing N as the minimum n for which 
qi(D?) < 8, . If for some n one of the components of a least favorable distribution 
is zero, we know that n is less than the M.E. sample size (Lemma 1). 

A Bayes d.r. relative to any & of which all components are positive is admissible 
[15]. Hence, any likelihood ratio d.r. is admissible, and under Assumption 1 
M.E. d.r.’s obtained by either of the above approaches are admissible. Thus, 
denoting an M.E. d.r. by D\, there does not exist a d.r. Dy for which p,(Dy) = 


p(Dx) (¢ = 1, --- , m) with strict inequality for at least one i (under Assump- 
tion 1). 
Suppose now that « real-valued statistic ¢ = (a; ,--+ , &n) exists which is suf- 


ficient for the class {ff} (¢ = 1,---, m), and suppose that ¢ has a monotone 
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likelihood ratio for some ordering of the elements of Q; i.e., if g:(t) is the density 
of ¢ corresponding to f,(x), then, for some ordering of the subscripts, 


giltidg (te) 2 gille)gj(h) 


for? > j and t; > & [8]. It follows almost immediately that for any ¢(x) which 
defines a likelihood ratio d.r. there exist constants {c;}, -» =o SoS °:: S 
Cm1 S Cm = ©, such that ¢(x) > 0 implies c;, S U(x) S c;. Moreover, 
(x) = 1 if the latter inequalities are strict, so that randomization may be re- 
quired only at the points ¢ = c; and only then if such points have positive prob- 
ability. Such d.r.’s have been called monotone [1], [8]. If, for example, f; is of 
the exponential type f; = 8(@)e""r(x), r => O and 6; real, for all 7, the above con- 
ditions are satisfied [1]. 

Example 1. Suppose f; is a normal density function with mean 6; (—* < 
0, < +++ < 0, < *) and known variance o°. Then ¢ = Z is sufficient and the 
c,’s and N may be obtained by first solving the following equations (iteratively) 
for the cj’s and n with p, = 1: 


(9) p(D,) = Pl nle? — 0,)/0] — P[/n(er1 — 0:)/o] = apn 
(¢ = 1, ---, mn), 


where ® denotes the standard normal distribution function, and then, choosing 
N to be the least integer =n, re-solving for the c,’s and py . Such a monotone 
rule will be minimax w.r.t. W,; for the M.E. sample size. Alternatively, (9) may 
be replaced by equations of the form 1 — p,(D,) = (1 — a;)p,, and a solution 
obtained which will be minimax w.r.t. Wi; 

Other examples may be treated analogously, allowing for randomization in 
the discrete cases if desired. 

C. Most economical decision rules relative to a matrix 8. To obtain M.E. d.r.’s 
as defined by Definition 2, we shall construct an artificial decision problem whose 
minimax solution will have the properties desired. For convenience, we replace 
each 8;; which is equal to unity by +o. 

Suppose n fixed, and let 2’ be a set of density functions g,;; w.r.t. » (¢ = 1, 
--+ ,l;7 = 1, +--+, m), where g;; = f; identically in x. Define a weight function 
W(gi;, Ar) = Wij , where 


(10) Winx = 1/8; iff = k@ =1,---,ljj,k = 1,--- , m) and 0 otherwise. 


We consider the artificial decision problem of choosing among A;,°---, Am 
when one of the /’ = Im density functions g;; is “true”, and where the “loss’”’ 
incurred by choosing A, when g,; is “true”, is W(g;; , Ax). The risk function is 
r(gij, D) = Se Winpin(D), where pin(D) = Pr (D chooses Ax | gi;) = 
pu(D); thus r(gi;, D) = pi(D)/Bi; @ = 1,---, 1; J = 1,---, m). Wald’s 
Theorem 5.3 asserts the existence of minimax d.r.’s. 

Tueorem 4. For each n = 0, 1,2, --+ , let D® be a minimar d.r. w.r.t. the weight 
function (10) for discriminating among gu , giz , -** 5 Jim for samples of fixed size n. 
Suppose for some n, max;,; r(gi; , D*) < 1, and let N be the least such integer. 
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Then Dx is an M.E. d.r. relative to tre matrix B for discriminating among f; , --- 
f, . Conversely, if there exists an M.E. d.r. relative to B and N is the M.E. sample 
size, then Dy is an M.E. d.r. 
The theorem may be proved ina similar manner to Theorem 1. Now let us con- 
sider the structure of these minimax solutions w.r.t. W; jx . 
Derrinition 4. A d.r. D defined by $(x) is said to be an unlikelihood ratio d.r. 
if there exist non-negative constants a,; (@ #j7;7 = l,--: , Us7 1,---,m), 
where for each 7 at least one a;; > 0, such that for any k and any zx for which 
d(x) > 0, Doizeauft(x) S Yivzjaifi(x) for all j # k. 
Setting a;; = &; /8:;, where — = (&), £2, +++, im) denotes an a priori dis- 
tribution over 2’, we have from Wald’s Theorem 5.1 that any Bayes d.r. relative 
to € is an unlikelihood ratio d.r. and conversely. Lindley [10] introduced such 
d.r.’s, obtained by his ‘“‘method of minimum unlikelihood.” Hereafter, we shall 
suppose £;; = 0 for every 7, 7 for which 8;; * without loss of generality. 
Wald’s Theorem 5.3 asserts the existence of a least favorable distribution 


’ 


sf 


0 . ). ° 
£, and that any Bayes d.r. relative to & is a minimax d.r. and conversely; more- 
over, 


(11) pi;(D')/8;; = max [p,;(D")/B;;] for any 2, j for which £; > 0. 


Apparently, however, there are no general conditions under which all £j; > 0, 
and consequently we have no proof of the admissibility of a minimax d.r. In 
fact, supposing | = m and the 8;,’s satisfy 5°” B;; = 1 for every 7, then £/; > 0 
for all z, 7 would imply p,; = 8:; , regardless of the sample size! Geometrically, 
the convex body in the /-m-dimensional space with coordinate axes p;; , corre- 
sponding to all possible d.r.’s for a fixed sample size, is not necessarily inter- 
sected by the line determined by p,;/8:; = pi; / Biv; for all pairs of sub 
scripts corresponding to incorrect decisions. However, we do have the following 
theorem in this regard, assuming 1 = m and 4; is “correct”? when f, is true (7 - 
Lb, ++ 5). 

Turorem 5. Suppose Assumption 1 holds and that D>): Bi; <1 for every i. 

For any sample size greater than or equal to the M.E. sample size, a least favorable 
distribution & has the property >> ti; > O for every j. 
The theorem may be proved by a contradiction, using Assumption 1, Definition 
4, Lemma 1, and constructing a Bayes d.r. relative to £’. From Theorem 5 and 
(11), it follows that pi(D )/B;; attains its maximum for at least one value of 
i for every j, where Dy is a minimax d.r. for samples of the M.E. size. 

Example 2. We shall consider unlikelihood ratio d.r.’s for samples of size n 
for Example 1 above. For simplicity, suppose o = 1, 1 = m = 3, and @ = 0, 
the alternatives A;, As, A; corresponding respectively to the densities fi , 
fe, fs. 


A dr. with acceptance regions 


RT? = {x:ht(x) S hz (x), hi (x), 


IIA 


h3(x)}, 


VU 


Ry = {athe(x) < hi (x), h(x) S h3 (x)}, 
R; = {a:h3(x) < hy(x), h3 (x) < he (z)}, 
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where hi (x) = aj;,f7 + af and (7, j, k) is a permutation of (1, 2, 3), is an un- 
likelihood ratio d.r. for the weights (a,;;). Denoting the sample mean by Z, we 
may replace h? by g? fa; exp (n0,é — n05/2) + ax, exp (n&F — nO,/2)). 
Now g} is an increasing function of # and g; a decreasing function; g? has a single 
sta.ionary point, a minimum. By sketching the three g? functions, it is clear that 
if none of the acceptance regions is to be empty, one of three possibilities must 


obtain: the acceptance regions are of the form Rh; = {x:% S core; S & S es}, 
R, {ree SES c;}, Ry = {xic, S = S co or = = c,}, where either c; = 2, 
or ¢; = ¢,, or both. (quality signs have been assigned everywhere in the R,’s 


for simplicity.) Let ¢ (=2 or 3) denote the number of ¢,’s to be determined. The 
c,’s may be obtained by solving c + 1 of the six equations p;; = p8i; for the 
c,’s and p, the choice of the equations to be solved being such that p;; S p8i; 
for all six pairs of subscripts. Theorem 5 may be helpful in this choice of equa- 
tions. To obtain an M.E. d.r., the sample size n is to be minimized subject to 
p = pn S 1. Similar methods may be applied to simple discrimination problems 


concerning any distribution of the exponential type. 


2. Composite discrimination. 

A. The problem. In this section we allow a continuum of possible density fune- 
tions. For specificity, assume 2 to be the space of a real- or vector-valued param- 
eter @ indexing the class of density functions w.r.t. u with elements f(z, 4). 

We further suppose that disjoint subsets w , --- , w; of @ are specified such 
that for every pair 7, j (¢ = 1,--+,m;j7 = 1,---, 1) there is a definite prefer- 
ence for or against the decision A; if the true @ ¢ w;. We suppose that none of 
the decisions is definitely preferred if @ is not in some w; ; this “indifference re- 
gion” is excluded from © for convenience. Under these assumptions, we say that 
the corresponding decision problem is one of “composite discrimination”? and 
a d.r. is a d.r. for “discriminating among a, ,--+ , w,.”’ Ad.r. D = D, is charac- 
terized by the functions 


p,(@, D) Pr (D chooses A,/@) = Sop G(x)" (2, O)du" (7 = 1,---, m), 


defined for all 6 €Q. 

We again consider two criteria for choosing a d.r. for composite discrimination. 
The first requires 1 = m and A; to be a “correct” decision if 6 ¢ w; and “‘incor- 
rect” if 0 ¢ w;(j # 7). For the second criterion, we suppose that corresponding 
to each w; one or more alternatives A; is preferable when @ ¢ a; . 

The definitions and comments of Section 1.A may be restated, substituting 
only w; for f;, infeew, pi(@, D) for p(D), and supe, gi(9, D) for gi(D). When 
| = m = 2, an M.E. 2-d.r. may be considered as a test of the hypothesis that 
6 © w, against the class of alternatives @ € we , satisfying bounds on the two kinds 
of error; such a d.r. may be obtained by considering, for each n, tests of size 
1 — aq w.r.t. w , Which maximize the minimum power w.r.t. w. and choosing 
that test for which n is a minimum subject to the minimum power being at least 
ae (7). 


Before extending this result to m-d.r.’s for composite discrimination, we re- 
quire some results in minimax decision theory. 
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B. Minimax decision rules for fixed sample sizes. We prove three theorems 
which may be useful in finding minimax d.r.’s. Also, if a sufficient statistic with 
a monotone likelihood ratio exists, Karlin and Rubin’s complete class theorem 
may be applicable [1], [8]. Sverdrup’s results [13] should also be noted. 

We shall use a number of Wald’s results in [14], Section 3.5 and 5.1.4, with 
some alteration in his assumptions and notation. We denote a weight function 
by W(0@, A;) = W;(@) (j = 1, «++ , m) and the corresponding risk function when 
using a d.r. D by r(@, D). An a priori distribution over the Borel subsets {w} of 
Q is denoted by = = (é, \), where Z(w) = Pr (0 ew) = >in EA.(w) and 


&; = Z(w,), Ai(w) = Pr (0 € w|6 € w;) (¢ = 1, --- , l). The average risk relative 
to = is denoted by r(=, D). Other terminology and notation will be self-evident. 


Wald’s Assumptions 5.1 and 5.6, his remarks on page 148 characterizing a Bayes 
solution, and his theorems 5.11, 5.12, 3.8, 3.9, and 3.10 characterizing minimax 
solutions are especially pertinent to what follows. Lehmann’s existence theorem 
for least favorable distributions [9] might also be noted. 

AssuMPTION 2. For each i, 7 pair (¢ = 1,--- ,l;7 = 1,--- ,m), W;(@) equals 
a constant, say W;;, for all 0 €0;. 
(That is, for each alternative, the loss varies only from subset to subset among 
w,, ++, w and not within any subset.) This assumption is sufficient to imply 
the validity of Wald’s Assumptions 3.1 to 3.6 (see his remarks on page 148). 


For a given set of conditional distributions \ = (A; , --- , A,), we denote 

. ; 
(12) a) =| seo ar, nt, «+2 
n is fixed and need not be evident in the notation. 

TueoreM 6. If Assumption 2 holds, a necessary and sufficient condition for a 
d.r. D* to be a Bayes d.r. relative to = = (&, \) for discriminating among w,,--- , 

: , Se ead ie d d 
w, 78 that D* be a Bayes d.r. relative to & for discriminating among fi, «++ , fi w.r.t. 


the weight function W;;. The average risk in the two cases are equal. 
Proor. Using Assumption 2 and (12), we have 


fo WO) f(a, 0) d= = Vi eWi f(x). 


The first part of the theorem follows immediately, using Wald’s Theorem 5.1 
and second paragraph on page 148. By expressing r(@, D) as in Wald’s (5.81), 
interchanging the order of integration, using (12) and Wald’s (5.2), we have 
for any d.r. D, 


(13) | r(0, D) dd; = r(f?, D). 
Denoting by 7(&, D) the average risk relative to — when discriminating among 
- See Ses we have 


(14) r(z, D) = rn (é, D), 


completing the proof. 
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THEOREM 7. Suppose Assumption 2 holds. Necessary and sufficient conditions 
that = = (&’, d°) be a least favorable distribution and D° a minimax d.r. for dis- 
criminating among w1,°°:, w, are that 

(i) & is a least es distribution and D° is a minimax d.r. w.r.t. Wi; 
for discriminating among fr, ++ ft’; and 
(ii) for any i for which ¢; > 0, Ju, r(0, D”) dr? = SUPbrw, 7(4, D°). Moreover, 
the maximum risks in the two cases are equal; 7.¢., 
(15) sup r(@, D®) = max r(f’, D’) 
2 isisl 

Proor. Necessity: Since =’ is least favorable, infp r(=’, D) = infp r((, d°), D) 
for any £, so that, using (14), infp ryo(’, D) = infp ryo(é, D); that is, é’ is least 
favorable. Using Wald’s Theorem 3.9 and then Theorem 6, D® is a Bayes d.r. 
relative to = and a minimax d.r. for discriminating among ft’, --- , fi’. 

We shall now verify (15). Using Wald’s Theorem 5.3 (iii), max; r(fv, D’) = 
doi Ein ( D°) = ryo(?’, D°), so that together with (14) and Wald’s Theorem 
3.10, we have max; r(fr, D’) = r(=’, D’) = supe r(8, D’). Continuing with the 
necessity, for any 7 for which &; > 0, we have r(fr’, D°) = max, r(fr’, D°) and 
supe, 7(8, D°) = supe r(0, D*) by Wald’s Theorem 3.10, which, together with 
(15) and (13), prove (ii). 

Sufficiency: By Wald’s Theorem 3.9 and Theorem 6, D® is a Bayes d.r. rela- 
tive to =’ = (#’, d°); ie., r(=’, D°) = infp r(=’, D). Hence, we need only prove 
th: at = is a least favorable distribution. Suppose it is not; then there exists a 

t, A) such that inf, r(=’, D) < infp r(=, D). Butinfp r(Z, D) < r(=, D°) = 
xs, 7 r(6, D®) dy; < > t; sup, 7(0, D’) S supe r(8, D°). Combining these 
last three results, r(=, D°) < supe r(@, D°). 

By Wald’s Theorem 5.3 (iii), for any 7 for which & > 0, r(fv, 
max; r(f,’, D°), which, together with (13) and (ii), implies sup., 7(9, D’) = 
max; supe, 7(9, D’) = supe r(6, D°). Hence, from (ii), 


r(=", D®) = 22 ee, 7(9, D®) dd = supa r(0, D°), 


a contradiction. Q.E.D. 
THEOREM 8. Suppose Assumption 2 holds, and suppose {d"} is a sequence of 
sels of conditional a priori distributions and D' a d.r. such that 


(16) lim | r(6, D’) dX; = sup r(@, D”) 


@; 


where for cach v = 1, 2,---, D’ is a minimax d.r. for discriminating among 
fv, ++, ft. Then D° is a minimax d.r. for discriminating among « , +++ , 1. 
Proor. By Wald’s Theorem 5.3, for each v there exists a least favorable dis- 
tribution ¢’, and D” is a Bayes d.r. relative to ’ for discriminating among f)’, 
Ke i.e., for any d.r. D, r,-(é’, D’) S r-(#", D), and hence, using (14), 


a7) We | (6, D) as TE i r(6, D) dx < sup r(6, D) 
’ w; i @; Q 
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Now each sequence {£;} has at least one limit point; let {2°}, 7 = 1, 2, 3 
be a sub-sequence of os = (, \”)} for which each &' an s to a limit, 
say £’: then >: §&; = 1. By W: ald’ s Theorem 5.3 (ii!) and ( , for each 7 for 
which £; > 0, bes r(0, a )d\; = max, 5. r(@, D”) dX! so or from (16), for 
each i for which £ > 0, sup., r(@, D°) = max; sup., r(4, D’) = supe r(, D’). 
Hence, from (16), limjie Do: t! fu, r(0, DY) dd = ¥o. e2 sup., r(0, D°) 
supe r(@, D’), which, together with (22), asserts supe r(@, D’) < supe r(@, D) for 
any D. Q.E.D. 

If a least favorable distribution exists, the problem reduces to one of simple 
discrimination, so that if » is non-atomic only non-randomized d.r.’s need be 
considered. A lemma for the case of composite discrimination analogous to 
Lemma 1 may be derived. 

C. Most economical decision rules relative to a vector a. As in Section 1.B, v 
shall apply the above theory to two specific weight functions W ,(@) and develop 
in each case a method of obtaining M.EF. d.r.’s relative to a. We assume / m. 
First, let 


(18) W(6,A;) = W,;(6) = —1/a; if 6 © w; and O otherwise.’ 


The risk w.r.t. W,(@) is r(@, D) = —p,(0, D)/a; if 0 e&w; (¢ = 1,--+, m), and 
sup, 7(0, D) = —inf., pi(6, D)/a; (t = 1, --- , m). By Wald’s Theorem 5.12 
(i), there exists a minimax d.r. D° for any (fixed) sample size. 

THEoREM 9. For each n = 0,1, 2, +--+ , let D’, be a minimax d.r. w.r.t. the we ight 
function (18) for samples of fixed size n. Suppose for some n, supe r(0, D),) < —1 
and let N be the least such integer. Then D', is an M.E. d.r. relative toa for discrimi 
nating among o,,°-** , Wm. Conversely, if there exists an M.E. d.r. relative to a 
for discriminating among w;. +++ , @m, and the M.F. sample size is N, then Ds 
isan M.E. d.r. 

The proof is like that of Theorem 1, replacing p,(D,) by inf., pi(@, Dn). 

Note that (18) satisfies Assumption 2 with W,; given by (3). Hence, if a least 
favorable distribution =’ = (£’, X”) exists, Theorems 6 and 7 imply that the 
composite discrimination problem may be treated as a simple discrimination 
problem with fi(x) = f(x) = fa, f(x, 0) ddj, and the theory of Section 1 will 
be applicable. If a least favorable distribution does not exist, Theorem 8 asserts 
that by a similar treatment for a sequence of a priori distributions having cer- 
tain properties in the limit, it may be possible to solve the composite discrimina- 
tion problem. Now suppose a least favorable distribution =’ = (#’, X’) exists. 
By Theorem 7(ii), 


| pid, D°) dx! = inf pi(@, D’) for any i for which &; > 0. 
wy Ocew; 

AssumpTION 3. If R is a subset of 9 for which in f"(x, 6) du” = 0 for some 
6¢Q, then fxf"(x, 6) du" = 0 for all 0 € Q. 
This assumption implies iuasiitha 1 for the density functions f}, ---, fe, 


6 See footnote 3. 
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defined by (12), for any set of conditional distributions \. If Assumption 3 holds, 
and if a least favorable distribution exists, it follows from Theorem 2, Wald’s 
Theorem 5.3(ii1) and (18) that 


(19) inf pi(6, D') = --- = : inf p,(6, D’), 


QO ew, Am bcm 


where D” is a minimax d.r. 


As a second approach, consider the weight function: 
(20) W(0, A,;) W (0) 1/p, if 0 e&w,,7 # j, and O otherwise, 


where B; = 1 — a; as before. Thenr(@, D) = qi (0, D)/B; if 6 e€ wi (i = 1, +++, m). 
We may proceed analogously to the first approach, making changes correspond- 
ing to those made analogously in Section 1. We thus obtain a theorem analogous 
to Theorem 9 and also 

THEOREM 10. Suppose Assumption 3 holds and that a least favorable distribu- 
tion exists. For any sample size greater than or equal to the M.E. sample size, 


I 1 9 
(21) sup qi(@, D’) v2 8 sup qm(@, D) 


Pi @ew, Dm 060» 


where D’ is a minimax d.r. 

No proof of admissibility of the M.E. d.r.’s derived in this section has been 
obtained. However, if Assumption 5 holds and there exists a least favorable 
distribution, it can easily be verified that there does not exist a d.r. Dy for which 
inf, pi(@, Dy) = inf, pi(9, Dw) (i l,--+, m) with strict inequality for at 
least one i, where Dy is an M.E. d.r. obtained by either of the minimax methods. 

D. Most economical decision rules relative to a matrix 8. Just as the approach 
of Section 1.B was extended in Section 1.C, we shall extend the approach of 
Section 2.C in this section to the consideration of M.E. d.r.’s for composite 
discrimination relative to 8 (8;;). 

Suppose n is fixed, and consider parameter spaces Q , --- , 2, , each Q; being 
identical to 2, and denote ’ UjQ;. For each j, denote the corresponding 
subsets by wij, °°: , @;. Define a weight function W(6, Ax) W,(9) for 
k 1,--+,m, by 

W,.(0) 1/B 
(22) 


if 0 © wi; and j k@=1, ---,dyj 1, --- ,m),and 0 otherwise. 


Then r(6, D) p(0, D)/B;; if @e€w;;. Let = be an a priori distribution over 
Q’ with components £;; = =(w;;) and \;;(w) = Pr(@ ¢ w| 6 € w;;). Fora given set 
of \’s, denote 


(23) yi(a) = / f"(a, 0) ddj;. 


© 


"<¢9 


Theorem 9 may be restated and proved, substituting only (22) for (18), +1 for 
-1, and 6 for a. The theorems of Section 2.B may be applied to obtain mini- 
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max d.r.’s for composite discrimination w.r.t. the weight function (22) by re- 
placing J in the theorems by l’ = /-m and replacing single subscripts 7 by 77 and 
f by g};. If a least favorable distribution exists, then the composite discrimina- 
tion problem reduces to a problem of simple discrimination among the ‘“aver- 
age” density functions g*; defined by (23) w.r.t. a set of “least favorable condi- 
tional distributions” A, and Theorem 5 and the remarks of Section 1.C are 
applicable. Thus, this method of solution gives unlikelihood ratio d.r.’s as M.E. 
d.r.’s. If a least favorable distribution does not exist, then a minimax d.r. will 
be a Bayes d.r. in the wide sense and Theorem 8 may be applicable. 

Example 3. Suppose f(x, 6) is a normal density function with variance o 
(known) and mean @, and 


w = {0:05 0}, we = [0:05 06 05}, w,; = {0:0 = 0s}, 


for some specified 0; < 0 -@ <4. may be shown that the least favorable 
conditional distributions over w , #2 , w; (Theorem 7) assign probability one to 
6, , 0, 0;, where 0 = 6) or 62 determined below. Thus, this example reduces 
to Example 1. (Karlin and Rubin’s results [8] also imply that a minimax rule 
will be monotone in #; determining the explicit form of the monotone rule is 
equivalent to showing that the above distribution is least favorable.) 

6. is determined as follows: 


0. = 02 if po(02,D’) < po(62 , D’), 
(24) 


” 


6. = 62 if p(s ,D”) < p(, D”), 


where D’ and D” are the solutions to the corresponding simple discrimination 
problems with 6. = 65 or 62 for fixed n. We shall show that such a determina- 
tion of @, is complete and consistent by showing that if p2(62 , D”) > p.(@:, D”) 
then p.(62 , D’) > p2(@:, D’) and conversely. From (9), with either a prime or 
double-prime on p, D, c¢ , and c., we have c, = 0 + oP “(ayp)/+/n and 

Co = 0, + of ‘(1 ~ asp) Vn, 


where ® (x) = ¢ is defined by #(t) = x. Substituting in p.(@, D) it becomes 
clear that it is a decreasing function of p for fixed 6. Now agp’ po( 42, D’) and 
ap” = po(0o , D”) so that 
(25) as(p” — p’) = p2(02 , D”) — pa 05 a). 
Suppose p.(@2 , D”) > p.(6;, D”); substituting in (25), it follows that p” > p’ 
since pe is a decreasing function of p. For the same reasons, 
0 < axl(p” — p’') < p2(62 , D’) — po(62, D’). 

Conversely, in the same manner, if p05 a) > p2(0> , D’), then 

a,(p” — p’) > p05 ,D”) - p25 , F). 
and p” must be greater than p’; hence, 


0 < ax(p” — p’) < p.(62 , D”) — pr(®, D”). 
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Other examples with exponential density functions may be treated analo- 
gously, and also similar examples for Section 2.D. 

Example 4. Now suppose o is also unknown; denote the mean by yu and re- 
place @ in the w,’s defined in Example 3 by u/c. 

Denoting Student’s ratio by ¢ and the sample sum of squares by 8’, (t, s) is 
sufficient for 6 (u, o). If we invoke invariance (under changes in scale), it 
follows from Blackwell and Girshick’s work [1] that a minimax invariant rule 
must be monotone in ¢. Theorem 8.8.1 in [1] proves, for the m-decision case as 
well as the 2-decision case, that invariance is no restriction when discriminating 
among 6,, °°: , Om, Where 6 u/o. Thus a minimax d.r. for discriminating 
among 6; , 42, 4; is monotone in t. By showing that the risk for a monotone rule 
is & Maximum in w, at w/o = 6; (with 6 determined as in Example 3), it will 
follow that a monotone rule in ¢, with c , cz and p determined by equations of 
the form (9) with the ’s replaced by non-central ¢ distribution functions, will 
be minimax for discriminating among a , ws, 3. 

Alternatively, this same result may be obtained by an application of our 
Theorem 8, letting \; assign probability one to sets of (u, 0) in which 


u/o 6, 


and letting o ° be distributed as x° with degrees of freedom tending to 0 as 
v— #. The details appear in [3], adapted from a 2-d.r. argument by Hoeffding. 

Example 5. We shall derive a three-decision extension of the sign test for 
the median of an arbitrary distribution function by adapting an example of 
Hoeffding [6]. (See also [12].) Analogously, an M.E. d.r. concerning any quartile 
of an arbitrary distribution may be derived. 

Let 2 be the class of all density functions f w.r.t. a fixed measure » on the 
real line such that n{z < 0} > 0, upijx > 0} > 0. Denote @(f) = force f(x) dy. 
Given 0, 02, 02,010 <0 <6; 545 & < 0 <1), letw, = {f:0(f) < A}, 


we = {f:% < O(f) < Or}, ws = {f:0(f) = 63}. The alternatives A,, Ao, As, 
corresponding to w;, ws, w3, might be that the median of the unknown dis- 
tribution is “appreciably” less than zero, ‘close’ to zero, “appreciably” greaier 
than zero, respectively. 

Let f(x, 0) = 61 — 0) /e if x < c¢ and O otherwise where c is an arbi- 
trary positive constant and b(x) = 1 if « S O and O otherwise, and let \ = 
(Ay, Ae, As) be a set of conditional distributions over w , we, w; , respectively, 
where X, assigns probability 1 to f(z, 6;) and where 62 is to be determined as in 
Example 3. It is easily verified that a minimax d.r. D, (n fixed) for discriminat- 
ing among fi , f2 , f3 is monotone in (x) = de b(x,), the number of non-positive 
observations, with c; , c. and values of ¢; when ¢t = ¢; or c. determined so that 
pi(0:, Da) = ap (t = 1, 2, 3) for some p; and 


pil@, D,) = Ble sane 1) + a,b(e1), p3(9, D,) = l _ B(c2) + (1 _ az)b(ce2), 


p(0, Dra) = Bleo — 1) + agb(cee) — Bley) + (1 — a,)b(e,), 
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where B = B,,6 and b = b,.» denote the binomial distribution function and 
probability function, respectively, and a; = ¢,(¢e;)) = 1 — i4:(c,). (It may be 
shown that D, defined above is also minimax for discriminating among 


b a a 


Dn, ’ Dn 6. ’ n 03 


This \ may be shown to be least favorable, and an M.E. d.r. may be obtained 
according to Theorem 9 (see Example 1). 
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ESTIMATION OF THE MEANS OF DEPENDENT VARIABLES 


By Outve Jean Dunn! 
Statistical Laboratory, Iowa State College 


1. Summary. Methods are given for constructing sets of simultaneous con- 
fidence intervals for the means of variables which follow a multivariate normal 
distribution. 

In section (3), a set of confidence intervals is obtained for each of two special 
cases; first when the variances are assumed to be known, and second when the 
variances are assumed to be equal. These two sets have the property that the con- 
fidence is known exactly, rather than merely being bounded below. In the case of 
known variances, the intervals are of fixed lengths (i.e., the lengths are the same 
from sample to sample); when the variances are unknown, the intervals are of 
variable lengths. It may be surprising to note that nothing need be known about 
the covariances in order to obtain confidence intervals of fixed lengths whose 
confidence coefficient is exact. These intervals are long, and do not make use of all 
the information provided by the sample. 

Each of sections (4) to (7) considers a different method for obtaining confidence 
intervals of bounded confidence level. In each section a set of fixed lengths is 
obtained when the variances are assumed to be known, while a set of variable 
lengths is obtained when the variances are unknown but equal. In section (5) the 
set of variable lengths applies to the general multivariate normal distribution, all 
the other confidence intervals in this paper require some assumption concerning 
the variances. 

In section (8) the sets of intervals are compared on the basis of length. One of 
the bounded confidence level methods, which has been established only for two or 
three variables or for an arbitrary number of variables with a special type of 
correlation matrix, is shown to yield the best possible set. Another of the bounded 
confidence level methods, whose use is established in general, is shown to be 
almost as good as the best set for confidence coefficients of practical interest. 

It is interesting to notice that intervals with bounded confidence level, are 
found which are much shorter than the ones whose confidence level is exact. This 
need not surprise us, however. In the case of just one variable, we might easily 
find that the 95% confidence intervals for the mean using the t-statistic were 
shorter on the average than 94% confidence intervals using order statistics. 
Moreover, since in admitting sets of confidence intervals with bounded con- 
fidence level we consider a much broader class of methods, we might almost 
expect that some of them would give better intervals. 


2. Introduction. The problem of estimating the unknown means of dependent 
variables arises frequently in situations where repeated measurements are made 

Received February 15, 1957; revised March 25, 1958. 

' Now at the University of California at Los Angeles. 


1095 








1096 OLIVE JEAN DUNN 


on the same individuals, and the assumption of independence is unjustifiable. In 
biological research, for example, growth data are often obtained with measure- 
ments taken on 7 individuals at k different times; the measurements would be 
highly correlated. The psychologist might measure n individuals’ responses to k 
different levels of a stimulus; again, a high degree of dependence would be ex- 
pected. The point estimates chosen for the means would be the same as for in- 
dependent variables; in this paper we wish to develop simultaneous confidence 
intervals for the means. 

Let 471, --+- , ye be k jointly distributed variables whose means are wy, , -+- , we 
respectively. A set of simultaneous confidence intervals for w,,--- , we with 
confidence coefficient 1 — a@ consists of 2k functions of the sample values, say 
gi and h;,7 = 1, 2,--- , k, with the following property: if F is the event that 
the interval g; to h; covers w;, 7 = 1, 2,--- , k, then the probability that F, , 
FE, ,---, Ey, oceur simultaneously is greater than or equal to 1 — a, where 
0 <a < 1. Symbolically, 


P(E\E., ++: , Ex) = Pg < ms <i, +--+, ge < ue <hkk) 21 —a. 


If the inequality sign holds, the set is of bounded confidence level. 

Paul G. Hoel has in a recent paper [1] given a method for estimating a mean 
regression curve and a confidence band for it which is applicable to the situations 
we have in mind provided one assumes the existence of a polynomial regression 
curve of a given degree. In this paper we shall assume that the experimenter is 
actually interested in the regression curve, but is either unwilling to make the 
necessary laborious calculations or else is unable to make the necessary assump- 
tions concerning its form. He knows that there exist methods for studying linear 
contrasts among the means, but this is not what he wishes to do. He might in- 
deed decide to make k different 95% confidence intervals, one for each of the 
k means; this is satisfactory only when he focuses on one individual mean. 

We shall assume, then, that he will welcome a set of k confidence intervals, one 
for each mean, being assured, with a high probability, that such a set covers all 
k means simultaneously. 

Another situation in which such a set of intervals would be useful arises when 
a regression line, curve, or surface has been fitted, and several predictions are 
made on the basis of it. 

Suppose, for example, that the assumption has been made that the variables 
x; are normally distributed with means a + #t; and variances o, and that the 
maximum likelihood estimate & + 6t; has been calculated from a sample of size m. 

At any particular value of ¢, say f& , one can obtain a prediction interval for 
xo, an observation drawn at random from the x’s belonging to to , by using the 
fact that up = x — & — Bty is normally distributed. But the research worker is 
cautioned not to do this for more than one value of t, and of course this is exactly 
what he wishes to do. 

If he goes ahead and gets such intervals at k different points, say ti, --- , tr, 
he has the same unsatisfactory situation as with repeated tests of significance. 
The variables u* = x* — & — Bt; , where 2* is an observation chosen at random 
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from the x’s at t = t;,7 = 1, 2, --- , k, are normally distributed and are corre- 
lated; thus the methods of this paper may be used to give simultaneous prediction 
intervals for the points i. ***. Me 


3. Confidence regions using independent linear combinations. 


3.1. Assuming first known variances, we seek independent linear combinations 
of the sample values which can be used to give a set of confidence intervals of 
fixed lengths whose confidence level is exact. 


The observations y; , Ye), °°, Yej,J = 1,°°:,”, are a random sample of n 
observations from n(y,,---, ye), the multivariate normal distribution with 
unknown means, uw, --* , we, Known variances, o7 a o; , and unknown co 
variances \,,, 1 * 8. 

Let 2; > j=1 A;Yi;, 1, --- , k, with the following restrictions on the a,, 


n 
(1) Dia«s=1, f£=1,---,] 
j=l 
n 
(2) 2, ys By, 0, 1#~Ss 
j=l 


n 
(3) 2. Ox C 2 Les ** ae 
j=l 


The means, variances, and covariances of the z; may then be calculated, re- 
membering that E(y;; — ui)(ysj — ue) = Aw, but that (since two observations 
in a random sample are independent) E(yi; — us)(Yej — ms) 0 for 7 + 7’. The 
means of the z; are calculated to be uw;,7 = 1,--- , k, their variances are pro 
portional to oi,°*:, o,, and their covariances are zero. 

To determine the a;,; , let A = (a;;), ann X k matrix. The columns of A may 
be considered to be k vectors in an n-dimensional Euclidean space, each with an 
end fixed at the origin. The three conditions imply (1) that the k vectors have 
their endpoints on the plane which passes through the unit points on the co- 
ordinate axes, P: Sh a; 1; (2) that they be mutually orthogonal; and (3) 
that their lengths equal c. 

If n = k, the columns of 


ti 0 0 
0 c 0 
D=| 0 OO -:- ec j,ann X 4h 
0 0 cee 0 
0 0 cee 0 


matrix, are k mutually orthogonal vectors of length ¢ whose endpoints lie on any 
plane 


ay 


rs 


M% , Aya a, 
€ c Mevt my, 
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The plane P’ can be rotated into the plane P provided the distances of the two 
planes from the origin are equal; this will be true if 


7 k 
¢ = 
] 1 
= 2 ee oe 
Mypst Mn 


To make the lengths of the confidence intervals formed from the z; as small as 
possible, ¢’ should be minimized. This is accomplished by choosing for P’ the 
plane >>" 1 (a;/c) 1; then ¢ (k/n) 

The solution is then A BCD, where B is an n X n orthogonal matrix whose 


. . 7 } 
first column consists of the elements n’; 


Fa... L|o --» 6) 

| vi Vk 

| fae) ae --> Q],ann Xn 
| 0 0 -->» QO 


r a Bee 


matrix consisting of zeros except for a k X k orthogonal matrix in the upper left 
corner whose first row isk’, --- , k*; and D is defined as before, with ¢ = (k/ 
lor C rotates the column vectors of D into vectors whose endpoints lie on io 


plane a; = n  B” rotates the plane >>? 1 a; 1 into the plane a n ’, so 
that B rotates the k mutually orthogonal vectors of length (k/n)' into vectors 
whose endpoints lie on the plane >?) a; 1. The problem thus reduces to that 
of writing down a k X k orthogonal matrix and an x X n orthogonal matrix. 
The z,,---+, 2 are then independently normally distributed with means uz, 


and variances (k/n)o; . Thus 


f ‘Ie ‘Ie 
P (2, — Vs ite <% My < 2 + /: Gila. *** BR 


1 — a, where c, 1s defined by 


( = a 
N(e,) i+ 


9 


with N the cumulative distribution function of the standard normal variable. 
The set of confidence intervals is z; + (k/n)’o.c, 


3.2. When the variances are unknown but are assumed to be equal, the same 
method may be used to construct ¢-variables whose numer: tors are independent 
but which have the same denominator, provided n > k. Let oj = 0 ,i = 1, +-> ,k. 

Let 


Pe. tm 1, «++, 
1 
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and 
n 
Um p Dim Yrs 5 m=1,---,n—k, 
j=l 


where r is any integer from 1 to k. The choice of r is arbitrary. It may be the same 
for each u,, , or different r’s may be used for the different values of m. The prob- 


lem is to determine the a;; and b;,, so that 2), +--+ , 2,1, -°* , Un—« Will be in- 
dependently normally distributed variables with F(z;) nr, 8 = Ty o+> 5 Be 
E(un) = 0,m 1,--- ,n—k, E(z; — pi)’ = (k/n)o’, i l,--- k; E(u2) * 
m i++ .@ k. This will be accomplished provided 
(1) 2 Gx lz 1, ---,k, since E(z,) wi 2, Gi Mi. 

j=l j=l 


n n 


E n 
(2) doa, ,i=1,-+-,k, since E(z, — ws)” = 0° Dai, c. 
j=l _ 


j=l 


(3) > aja, 0,7 # s, since E(z; — wi)(z, — we) = Vie Dd ajay. 0 
j=l 

(4) Dim 0, m l,---,n—k, since E(u) = um a Dim 0. 
j=1 ia 
n n 

(5) 7 bi: 1,2 l,---,n — k, since E(iu,) o a Din Oo. 
j=l j=l 

(6) >> bimbj. = 0, m ¥ 8, since E(tm us) = Ely — wr) (yer — wr) DS Dim, 0. 
j=1 j=1 
n 

(7) = 05: Dim 0,7 i, *<* oe l,---,n —k, since E(2; — u,)(u,,) 
j=l 


Nir D, Gjidjm = O. 
j=l 


Thus n mutually orthogonal vectors are needed, k of length (k/n)’ with end- 
points on the plane >>", a, = 1, and n — k of length one with endpoints on the 


plane 37, a, 0. 


Let 
\4/* 0 0 0 0 0 
iV x 
k 
| 0 / ia. 0 0: O 
k4 4 n 
| 
| 9 0  -:: che e.- @ 
D 4 n 
| 0 0 0 1 oO 0 
| 0 0 0 | 0 
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ann X n matrix whose columns are n mutually orthogonal vectors of the needed 


lengths. 
Let 
at - 0 0 
Vk Vk 
— 0 0 
| 0 0 1 0O--- @ 
0 --- @ eo ££ -.. 
0 --- 0 00--- 1 


ann X n orthogonal matrix which rotates the first k columns of D into vectors 
whose endpoints lie on the plane a; = n t and which leaves the last n — k columns 
unchanged. 

Let B be an n X n orthogonal matrix whose first column consists entirely of 


pts ais 
the elements n *. Since B rotates the plane a; = n™“ into the plane 20 a, i. 
it must also rotate the parallel plane a, = 0 into }>7., a; = 0. 

Thus A = BCD is an n X n matrix whose columns are orthogonal vectors. 


The first k are of length (k/n)* and have endpoints on >of, a; = 1;thelastn —k 
are of length one and have endpoints on }>"_, a; = 0. 
Then let 


i; = == Mi :, bl ca 
V n(n — k) 2 Um 


These are k t-variables whose numerators are independent but whose denomina- 
tors are the same. Their frequency function is (see [2]): 


k (n/2) 
“() be 
fn_e(ti, ---, &) = z | = 


-# — fk 
[x(n — keer P = ) . 


If c, is defined by 


/ a | ta aih, 5 oes t.) dl, , 7 e# , dt, l om 


fa —~Ca 


then P(—ca < th < Ca,°°* , —Ca < te < Ca) 1 — a. Thus an exact set of con- 
fidence intervals of equal but variable lengths is obtained: 


/ n—k 


; k 2 . 
21 + Ca V ae ue, t=1,---,k. 
nin — k) m= 


4. Intervals of bounded confidence level using the chi-square distribution and 
Hotelling’s 7'-distribution. 


4.1. Known variances. For a sample of size n from the multivariate normal 
distribution with means uw; ,--- , we and covariance matrix (\;,), the expression 
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n> int Dra A"(Gi — wi)(GJe — we) follows a Chi-square distribution with k degrees 
of freedom. Here \“ denotes an element of the inverse matrix (A") = (Aw), 
and 9; is the sample mean of the observations on y; . Then 


k k 


w = \ ~ Ca 
> - rv (i: — us) (9, — Ms) = ’ 
im] sal nm 
where c, is defined by Ux(ca2) = 1 — a, with U;, the cumulative distribution func- 
tion of a Chi-square variable with k degrees of freedom. In the parameter space 
of the 4; , --- , ue, this equation defines an ellipsoid, which will be denoted by £. 
Then 


* w 

Le) _ { - Cc , 

ar (= z. NGi — wd Gs — we) < *) = PIE covers (uy, ---, m)] = 1 — a. 
t—] sal n 

To obtain a rectangular confidence region of bounded confidence level, a rec- 

tangular parallelepiped, say R, with boundary planes parallel to the coordinate 

planes in the 4, --- , us space is circumscribed around the ellipsoid FE. The 
boundary planes of R are found to be 


0; 


5 = Yi + / / Cas 
Ml Y Jn Vv 


and are not dependent on the correlations. 
Then P[R covers (u,--- , ue)| > PLE covers (wu, --- , ue)| = 1 — a, thus 
giving a set of intervals, 7; + (0; n')ch , with Ui(ca) = 1 — a. 


4.2. Unknown variances. The same method applies when the variances are un- 
known and n > k, using Hotelling’s T-statistic. Here F is the ellipsoid > tat 
i l"(9; — ud(Ge — ws) = c/n where (I) is the inverse of the matrix (Jj;.) 
and l;, > (yii — WY — J.)/(n — 1), t= 1,---, ks l,---,k. The 


boundary planes of R, the cireumscribed parallelepiped, are nh, = 9; + (6,, n')ca, 
where 6, = U);. For cq defined by F(ca) = 1 — a, with F the e.d.f. of Hotel- 
ling’s 7’, the set of confidence intervals is 9; + (6, nea, ¢=1,2,---,&. 


It is to be noted that this is the only set of intervals given In this paper for 
which no assumption has been made concerning the variances. For the other sets, 
the variances were assumed to be known or else to be unknown but equal. 


4.3. More general distribution functions. For n large, 7? can be assumed to 
follow a Chi-square distribution with k degrees of freedom, even though the 
original variables are not normally distributed [3]. A set of confidence intervals 
for w,-°* , ue is then 9; + (4; n')eh , With, the upper @ point of the Chi-square 
distribution with / degrees of freedom. 


5. Bounded regions based on linear contrasts. Henry Scheffé [4] obtains 
simultaneous confidence intervals for the totality of linear contrasts among k 
means, #1, -°-* , we, using the F distribution. He shows that P(@ — Sés S 0S 
6+ Sés) = 1 — a. Here @ is any linear contrast; S’ = (k — 1)ca: Cais the upper 
a point of the F distribution with k — 1 and »v degrees of freedom; vy is the de- 
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grees of freedom of the x variable used in estimating the variance; and P denotes 
the probability that all such intervals cover their corresponding contrasts. 
It can easily be shown that confidence intervals for the totality of linear com- 


binations of uw, --- , a, are similarly obtained from P(@ — S65 < 6 < 0+ S)é; 
1 — a, where now S° = ke, , with c, the upper @ point of the /’ distribution with 
k and v degrees of freedom. Since the k means yw, --+ , we are a subset of the 


linear combinations, confidence intervals for them follow immediately. 


5.1. Variances known. If the variables y:,--- , y are normally distributed 
with unknown means ,-°-- , we, Known variances 01, «++ , 0, , and unknown 
correlations, p;,, then the x” distribution is used rather than the F distribution, 
and we have: 


) an C1 pom fe Ba Oj “ 
I | i oe ee 2 Le S Bh Ss W las °°" 9 Fi 
Vn Vn 


Here ¢q is, as in section 4.1, the upper @ point of the x° distribution with k& degrees 
of freedom, and the intervals obtained are the same as those of section 4.1. 


5.2. Variances unknown but equal. When the variances are unknown but 


equal, then as an estimate of o° one may use 6) = )o"(y; in) /(n — 1). Then 
> . k A Pa - a ' I A — 
rik = | M1VCea SNS MS YT { O1V Ca, °** Yk 
n Vl 
Ky 7 oa Ks 
es NV O11V Ca < Mk < Ye t { OWtei = 1 a, 
” Ml 
with c, the upper a point of the F distribution with k and xn — 1 degrees of free 


dom. The confidence intervals are 7; + (k/n)'éc1, . 

It may seem unsatisfactory to use only the data from one sample point as an 
estimate of o°; this has been done in order to have a x’ variable for the denomina 
tor of the F variable. 

If one wishes to use a pooled estimate of the variance, é, = >>). 6¢/k, then 
é,, no longer has a x’ distribution because of the dependence of the variables. It is 
possible to show, however, that the F distribution may still be used, provided for 
degrees of freedom one uses k andn — 1 (rather than k and k(n — 1)). That the 
degrees of freedom may not be increased may be seen by examining the extreme 
case when all the correlations are equal to one. 

To establish the necessary inequality for using ¢;, one may fix & , 
and consider the conditional probability 

/ 


. k . ~— ‘ke P 
Pj; _ i/ 7 op V Ca <mw< F+ / 7 op Vast’ Uk 


/ 


/ k / ~- / ie / A A 
— V op Vea < me < G+ VV Sp Vea! G15 °*> yd 
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‘ ls Zé; u wit ‘kZés . 
r (a = V j VCa NS MI < “AT VV ’ We Gea *** Yk 
a ¥ l 


k 
Kk =6 , a . / t Dé; / “s - 
V 7 kV Sa <u < HK + 4 7 E Vc loi, -*+, Oe 
ih Pigq, - kK a ae a  " K Fi ee = 
Y\ V Gi VCa S bi Aa + 4 Gi V Ca» "Yk 
t=) 1 n 


k ‘ k J 
4 Gi Vila S Me S Ye V Gi V Ca) O1,°°* Oe k 
7 y n 


‘Thus for the unconditional probability one has: 


Th N/ be Veu <a Sti t M/ bp Vous" 
d t 


| : A 

| Oy WW Ga < Mk << Yi tT 4 0G; Vv c,) > | —a. 
Ml fl j 

6. Regions based on a bonferroni inequality. Confidence regions can be ob- 


tained very simply using a Bonferroni inequality [5]. The use of this inequality 
in a related situation was suggested by kk. Paulson [6]. 


6.1. Variances known. Let ni(yi,--*, Yes ‘Mis 01, Pir) be the frequency 
function of / normally distributed variables with means py; ,--- , ue, known 
variances ¢;,--+ , 0, , and unknown correlations p,,. Let 9; be the mean of a 
random sample of size n, Yi, +++ y Yin. 

Let z, (ji; — wi)n’)/a,), 7 1,--- ,k. Then the joint frequency function of 

ze IS m(21, °-* , 2 30, 1, pie), and 
P( ge « 21 < G. 2% ,_—Cc < 2. « Cc) 
me(2,,°°° »2e 30,1, pis) dzy --- dz. 
Using a Bonferroni inequality, this integral is greater than or equal to 1 
2k(1 — N(e)), where N is the e.d.f. of a standard normal variable. Setting this 
expression equal to 1 a, Ca may be defined by N(c.) l (a/2k). Then 

ff Y—- wu) Vn _L (Yr — wi) n 

| c.. < © Ges *,~~—-G. K - < ¢. 
\ Oo) Ox 
P[R covers (wu, °°: ,ux)] > 1 — a, 
where R is bounded by 
=- G; 
Mi Yi Ca 
Vn 

6.2. Variances unknown but equal. Let y,, 7 = 1,---, k, have the joint 

frequency function m(y,-** , Ye imi, O, pis), Where the variances are unknown 
i 


but equal. Let z, (ji — ws)n’)/o, 2 = 1,-:-, &. 
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We wish to define Student ¢-variables t; , --- , & using 2, «++ , 2 in the numer- 
ators and using the same Chi-square variable in the denominators. If uy; = }°?_, 
(yi; — 9;)°/o’, then u; is a Chi-square variable with n — 1 degrees of freedom. 
Since the uw; are not independent of each other, we choose one, say wu, , to use in 
all the denominators, rather than use their sum which does not have a Chi-square 
distribution. 


Then 


Vn — 12; V n(Gi — ui) ' 
“ Gy 


t 


are Student ¢-variables with the same denominators. Their distribution function 


[3] is; 
{ke+n— ] 
én oe, 
2 
Fue ahty., = 5 tes Pea) , 
eos 
(n — 1)*/?ek?P 4 9 ) 
(p") "i+ ys 2g its| - —~ 
i=1 sel - 

where p” is an element of (p") (pis) ', and | (p") | is the determinant of (p") 


As in 6.1, 


P(—c < th <c,-++,-e < & < ec) 


| _ faith, «°*, te; pu) dt; --- dy 2 1 — 2k(l — H,_s(e)), 


where H,_, is the ¢.d.f. of a t-variable with n 1 degrees of freedom 
The set of confidence intervals is then 
; Gy 
i = p= C6 
Vn 
where 


Hys(c.) = 1 — 


and 
a} = (yi; — in), (n — 1). 
?7=1 


As in section 5.2, it is possible in these confidence intervals to replace 6, by 
é, , the pooled estimate of the variance; n — 1 must be retained as the degrees 
of freedom. 


7. Regions with bounded confidence level using inequalities between de- 
pendent and independent cases. 


7.1. Variances known. For y, ,--- , y independently normally distributed 
e . 2 2 
with unknown means y4,, --- , we and known variances, o1,--- , on , let 2; be 
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defined by x; = (n*(9; — ui))/o; , where 9; is the mean of the n observations on 
the ith variable. Then 


P(—ca < %1 < Ca, ***, —Ca < Me < Ca) = I] P(—ca < ti < Ca) = 1 — a, 
a 

where Cc, is defined by N(ca2) = }{1 + (1 — a)"“], with N the e.d.f. of the uni- 
variate normal distribution. The set of simultaneous confidence intervals whose 
exact confidence level is 1 — a@ is then 9; + OiCa/n. 

If, now, the y%,,--- , ys are defined as above except that now there may be 
correlations among them, the same confidence intervals can be used as a set with 
bounded confidence level, provided it can be proved that 


P(—Ce < 4 < Cay ***» “Cn XB < Oe) Sl —e@. 
The proof of the following theorem establishes this inequality for certain cases. 


THeoreoM. If x, , +--+ , 2 are normally distributed with zero means, unit vari- 
ances, and correlations p;, , then 


z=c k 
| eee | me(ati, -*:, te; 0, 1, pis) dt --- dx, = lf n(x; 0, 1) az | : 
c z=——c 


provided (1) k = 2 or 3; or (2) px. = bib, , fort, s = 1,2, --- ,k, i # 8 and with 
0<b; <1,1=1,2,--- ,k. The region of integration C is the region bounded by 
the planes x; = +c,7 = 1, +--+ , kj me(ar, «++ , 2 50, 1, pis) is the frequency fune- 
tion of 7; , +--+ , a ; and n(x; 0, 1) is the standard univariate normal frequency 
function. 


Proor. (1) k = 2, 3. For brevity the proof is merely outlined. The expression 
J--- flor, «++, te 50, 1, pis) day, --+ , dx, may be regarded as a function of 
e 


the pis, say F(p;.). The proof consists in showing that for all admissible p,, , 
F'(p;.) has an absolute minimum at the origin of the p;, space. 

First it must be shown that there is a relative minimum at the origin. This 
can be shown for any k by considering the various first and second partial de- 
rivatives with respect to the correlations. 

The first partial derivative with respect to py, say Fy», can be shown to be: 


z3™ rene 
I _ ; 2 / i / 
¢ IE" 


23> 
-[me(c, c, 23, °°* , Xe; 0, 1, pis) — mele, —e, 73, -+-, te 5 0,1, pis)] drs, --- , dre: 


Similarly, the second derivative with respect to py and p,, , say Fy, p_ , is 


rac rpc 
Fi2.pq = 2 | owe m(c, C, %3, °°*, 3 0, 1, pis) 
hes Rake 
k k 
+ (2X on) (X o's) 
t= t=xl 
rpc z\=¢ 
rome Tome 


dx3, «++, dx,—a similar integral with 2, = ¢, 7%» = —e. 
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When all the p;,’s are zero, it is easily seen that 7}, vanishes. Further, Fy, p, 
vanishes also at the origin unless p = 1 and q = 2, while F'2,2 is seen to be 
positive. 

Thus in the expansion of F(p;,) about the origin, the first degree terms vanish 
and the second degree terms form a positive definite quadratic form, so that 
F (pis) has a relative minimum at the origin for any k. 

The next part of the proof is to show from the form of the first derivative, that 
at any point beside the origin, at least one of the first derivatives differ from zero. 
This was done only for k = 2 and 3. 

The set of all admissible points (points such that (p;,) is positive definite and 
0 < | (pis) | < 1), together with the boundary points, form a compact set, so that 
F (pis) must assume an absolute minimum either at an admissible point or at a 
boundary point. Hence if it can be shown that no point on the boundary of the set 
yields an absolute minimum, then the absolute minimum of F must be at the 
origin. 

For k = 2, the boundary points are just py = +1, and they actually yield 
absolute maxima for F'(py). 

For k = 3, a boundary point, say (pi2 , pis , p23) Was considered. It was shown 
that for m sufficiently close to 1 but less than 1, (mpy2 , mpi3 , p23) is an admissible 
point, and that the derivative of F at (mpi, mp3, p23) in the direction of 
(p12 , p13 » P23) IS positive. Hence (pie , pis , p23) cannot yield an absolute minimum 
of F. 

This completes the outline of the proof for k = 2 and 3, with any correlation 
matrix. 

(2) For any k, if pi, = bib, , withO < b; < 1forz = 1, --- ,k,a proof may be 
given which is adapted from the proof of a similar theorem by C. W. Dunnett and 
M. Sobel [3]. 

For yo, Yi, °** » Ye independently normally distributed, with zero means and 
unit variances, define 


ai=vVi-—- by: — byo, a=1,---,k. 


Then the z,’s are normally distributed with means zero, unit variances, and 
correlations piz = bib, . 
The theorem may be restated as follows: 


k 
P(—-c <m <0¢,---,-c< am <c) = [[ Pl—-e < 23< 0), 


t=! 
or 
P(—c < V1 — biys — iyo <c,+++, -¢ < V1 — Bye — bey < 0) 
k 
= I] P(—e < 7i — b? ys — biyo < ce). 
t=] 
or 


k 


Pld, < yw < e1,-°++, ak < Yr < ek) = Il P(di < yi < e:), 
=} 


’ 








DEPENDENT VARIABLES 1107 
where 
ia ee ny ES f= i,---,k. 
V1 — Bb V1 — BF 
This may be written as: 
voro viel Vkek 
[ lf si / m(yi, ++, Ye; 0, 1,0) dy, --- dy. | mi(yo ; 0, 1) dyo 
“vo « “ui~d) Vewdk 


k 


yor® vive; 
2 II / || ni(yi; 0, 1) ay. ni(yo; 0, 1) dyo, 
t=] “yoe—o vy; 
or 
vor A k yore 
| {1 Fw | ni(yo; 0, 1) dyo = I] | F (yo)mi(yo; 0, 1) dyo, 
v0 “2 t=l t=1 “ yo——o 
where 


F (yo) = / ni(yi;0, 1) dy:. 
Wd, 


Thus the inequality becomes: 


BE (11 Fw) 
t=1 


The expected value of a product of monotone bounded functions is greater than 
or equal to the product of their expected values [6], so that the last inequality 


> T] E(F p)). 


t= 


would hold if the F; were monotone. The functions F';(yo), however, are seen to 
increase from — © to 0 and to decrease from 0 to ©. Since the frequency function 
of yo is symmetric about the origin, the transformation z | Yo 


inequality to 
E (II F)) 
i=l 


where F(z) are monotonically decreasing bounded functions. This completes the 
proof of the theorem. 


changes the 


‘ 
> |] E(F.(2)), 


t= 


7.2. Variances unknown but equal. When the variances are unknown but 
equal, Student ¢-variables ¢; with the joint frequency function 


Tn (ty > ti . aad, 


as defined in 6.2, are used to form confidence intervals. Using the same methods 
as in 7.1, the following theorem can be proved: 


THeoreM. For k = 2 or 3, 


vee lf. (ti, +++, te 5 pie) dty --> dt =f | f. i(t, ---,t&; 0) dt, --- dt 
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For pz = bo, , with 0 < 6b; < 1,4 =1,---,k, 


c k 
/ . a [ fate, ose ty, ; Pis) dt, ae dt, eS | fn—s(t) at| . 


In this theorem, whose proof follows the same lines as the one in 7.1, C is the 
region bounded by t; = +c,i = 1, ---:,k, far(t) is the density function of a 
Student t-variable with n — 1 degrees of freedom, and fy_i(4, , --- , te ; 0) is the 
joint frequency function of the ¢-variables when all p;, are zero. 

Since 


Jf i 
n(ji — pi) ; 
i: = Vv 4 B 2 ‘= Ls 
1 
sets of confidence intervals obtained are as follows: 
Fork = 2 or 3, H+ (fh n')cq , Where c, is defined by 


ela a ( 


| | (ih +++ &itedah hot ~~ 


a 


For any k and p;, = bib,,0 <b; < 1,7 = 1, --- , kh, the same set is obtained: 
but with c, defined by H,_:(c.) (1 + (1 — @)"*)/2, where H,_, is the c.d_f. of 
a Student ¢-variable with n — 1 degrees of freedom. As in sections 5.2 and 6.2 one 

3. ~ 2 . 
may use 6, in place of é;, provided one keeps n — 1 as the degrees of freedom. 


8. Comparison of confidence intervals. In Table I are listed various sets of 
confidence intervals, with their properties and restrictions. 

One rather obvious way to compare them is by comparing their lengths, or the 
expected values of the lengths. In Table II are given numerical values of d, for 
1 — a = .95, where 


/ 
da = Y" V/E(40, 
o 


with ¢ the length of the confidence interval. Throughout Table II, the variances 
are assumed to be equal. 

When the variances are known and equal, and all the correlations are zero, the 
shortest set of confidence intervals must be those of section 7.1. When nothing is 
known about the correlations, no shorter set can be obtained. The last column 
in section 7 of Table IT therefore gives the smallest obtainable values for d, , and 
may be used as a standard for comparison. 

For 1 — a = .95, the Bonferroni inequality intervals of section 6 are almost as 
good as the best ones. Indeed for 1 — a@ as low as .80, the values of d, are still 
very close, being: 


k Bonferroni “Best” 
1 1.28 .28 
2 1.64 1.61 
} 1.96 1.92 
6 2.13 2.09 
& 2.24 2.20 
10 2.33 2.29 





TABLE I 
Conditions 


Confidence Intervals for Means of Dependent, Normally Distributed Variables 
Definition of ¢o 


Section Confidence Intervals 
. h : 1+ = 4) 
3.1 D aiyij % — C1°Ca Nic.) = . -in 2 k (1) 
Dad n 2 
. kook 1+ (1 — a)! 
32 1D ayy + p< — > es Hy_4(Ca) = re>taD 
—" n(n — k) m=1 2 
a, ——- . 
1 ii t-—=-: Voeu eles eo te (4 
Vn 
a . 
1.2 Vi == * Ce F(ca) = 1 a n>k (5 
Vn 
- 0; . 
2 Wit - Ca Us(ca) = 1 —a (4 
Vv n 
, a , 
5.2 i+ —= Vea Fin-i(Ca) = a 6 
Vn 
; = Vv t-= 1 
6 Yi Ca (ca) = - 
Vn 2} 
és a 
6.2 Yyit Y H, 1\Ca) = 1 a 2) 
Vn 2k 
~ 0; i. 1+ (1 a)! k 
cen Vit —= t N (ca = (1 
Vn 2 
k= 2, 3 Or pu = byb, 
- 3; 1+ (1 - a 1/k 
1.2 Yt °* Le H,, (Ca) = . 
Vn - 
k=2,3 (2, 3) 
or pis = Dib, (2 
(1) NV is the cumulative standard normal distribution function 
(2) H, is the cumulative distribution function of a Student t-variable with » degrees of 
freedom. 
(3) This definition of ca is approximate. The exact definition is 
Ca Ca 
[ ae | Slt, -+-, td dh,---,dy =1 a, where f,(i,--+,&) 
. k+yp 
2 ; eas e errs 
en . 1 + where 
ii v v 
pritgkiap { - 
2 


v is the degrees of freedom of & . 
(4) Ux is the cumulative distribution function of a Chi-square variable with k degrees of 


freedom. 


(5) F is the cumulative distribution function of Hotelling’s 7. 
(6) Fen. is the cumulative distribution function of an F variable with k and n — 1 
1109 


degrees of freedom. 
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TABLE II 


Comparison of Lengths of Confidence Intervals for Means of Dependent, Normally 


Distributed Variables with Equal Variances, 1 — a = .95* 
n 
k Variances Unknown Variances Known 
Section 4 6 8 10 20 Section Any 
4.1 :.2 
l 3.18 2.57 2.36 2.26 2.09 1.96 
2 10.8 5.52 4.53 4.12 3.55 3.17 
t 27.9 11.4 8.63 6.13 4.98 
6 42, 3 17.8 8.75 6.44 
Ss 77.0 11.8 7.42 
10 15.6 S85 
5.1 5.2 
l 3.18 2.57 2.36 2.26 2.09 1.96 
2 7.55 4.16 3.47 3.17 2.74 2.45 
4 13.9 6.70 5.21 3.79 3.08 
6 20.1 9.12 4.82 3.55 
S 26.4 6.01 3.94 
10 7.53 4.28 
6.1 6.2 
1 3.18 2.57 2.36 2.26 2.09 1.96 
2 4.37 3.40 3.08 2.92 2.66 2.46 
4 6.04 4.56 4.06 3.81 3.41 3.08 
6 1.82 5.45 4.82 4.50 3.98 3.55 
8 8.41 6.21 5.37 5.08 4.46 3.94 
10 9.38 6.88 6.03 5.60 4.89 4.28 
‘2 7.2 
l 3.18 2.57 2.36 2.26 2.09 1.96 
2 4.17 3.16 2.84 2.68 2.44 2.24 
+t 5.41 3.80 3.33 3.1] 2.76 2.50 
6 6.22 4.22 3.64 3.36 2.94 2.64 
& 6.92 4.53 3.86 3.55 3.07 2.74 
10 7.47 4.77 4.03 3.69 S87 2.81 
8.1 8.2 
l 3.18 2.57 2.36 2.26 2.09 1.96 
2 4.16 3.15 2.83 2.68 2.43 2.24 
+ 5.35 3.79 3.02 3.10 2.75 2.49 
6 6.17 4.20 3.62 3.35 2.94 2.63 ’ 
: 6.86 1.50 3.54 3.53 3.07 2.73 
10 7.40 4.76 4.01 3.67 3.16 2.80 


* The figures given in the table are values of (n'/o)y ‘EGQO?, where ¢ is the length of the 
confidence interval. 
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It would be interesting to show that the “best” intervals can be used for arbi- 
trary k and arbitrary correlations, but from a practical viewpoint, for 1 — a@ 
large enough to be of interest, the Bonferroni regions are good enough. 

The regions of section 5, based on the T-distribution and the x’ distribution, 
compare favorably only when k is small and n relatively large. The regions with 
exact confidence level are everywhere unnecessarily long. 
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A MARKOVIAN FUNCTION OF A MARKOV CHAIN! 
By C. J. Burke anp M. RosENBLATY 
Indiana University 


1. Statement of the problem and the results obtained. Consider a Markov 


chain X(n),n = 0, 1, 2, ---, with a finite number of states 1, --- , m and sta- 
tionary transition probability matrix P = (p,;) 

pis = P[X(n+ 1) =7|X(n) = 7 2] O,7 tj=1,---,m, 
(1) 


2s pis = 1. 


The probability structure of the chain is determined by P and the initial prob- 
ability distribution vector p = (p,) 


p; = PiX@) = ¢] 2 0, ¢ = 1,---,m, 


>i Di = ]. 


Suppose the experimenter does not observe the process X(n) but rather a de- 
rived process Y(n) = f(X(n)) where f is a given function on 1, --- , m. The 
states 7 of the original process X(n) on which f equals some fixed constant are 
collapsed into a single state of the new process Y(n). Call these collapsed sets 


(2) 


of states S;,7 = 1,---,7,r < m. A natural question that arises is as to whether 
or not the new process is Markovian. It is clear that this is not generally the 
case. 

Let us restrict ourselves to a process X(n) with its initial probability distribu- 
tion a left invariant vector of the matrix P, that is, pP = p. Further assume that 
all the components of p are positive (all transient states are thrown out). Let 
D be the diagonal matrix with its 7th diagonal entry p; . The process is said to 
be reversible if 

DP = P’D 


(P’ is the transpose of P). The following result is obtained: 
THEOREM 1. Let X(n) be a stationary reversible process with p; > 0 for all 7. 


Then Y(n) ts Markovian if and only if for any fired B = 1,---,7r 
(3) > > pi; = P(X(n + 1) € Ss| X(n) = i] = Cs, s, 
j€S83 
has the same value for all i in any given collapsed set of states Sa,a = 1,---, r. 


A slightly different problem can be phrased in the following way. Let 
w= (w,),w; >0,2 = 1,---,m 


Received November 13, 1957; revised March 31, 1958. 

1 Work sponsored in part by the Office of Naval Research under Contract Nonr-908(10) 
Reproduction in whole or in part is permitted for the purpose of the United States 
Government. 

2 J. L. Snell pointed out that the original proof, given for Markov processes X(n) with 
a symmetric P, holds for the reversible processes. 
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be any initial probability distribution. Consider the Markov process X(n) 
generated by initial distribution w and transition probability matrix P. Again 
consider Y(n) = f(X(n)) and require that Y(n) be Markovian whatever the 
initial distribution w. 

CoroLuARY 1. A_ sufficient condition that Y(n) be Markovian whatever the 
initial distribution w of X(n) is given by (3). Nonetheless, condition (3) is not 
generally necessary if the collapsed process is to be Markovian even in the 
problem covered in Corollary 1. 

THeoreM 2. Let f be a function that collapses only one class of states S. Y(n) 
1s Markovian whatever the initial distribution w of X(n) if and only if one of the 
following two conditions ts satisfied: 


(4) (i) z Pit Piu = Pes Cy 


les 


for all u gS and all k; 


(5) (ii) Piss =O forall izS. 
Here 


Ds = , pes = P[X(n + 1) ¢ S| X(n) = Kl. 
jes 
An example of a Markov chain satisfying (4) but not (3) is given in the body 
of the paper. 
Condition (4) naturally suggests the condition given in Corollary 2. 
CoroLuary 2. A sufficient condition that Y(n) be Markovian, whatever the ini- 
tial distribution w of X(n), ts given by 


(4’) ze Pet Pi.sg = Pes, Cay .8s 
les, 


for all k, a, B. 

Suppose we now go back and consider the class of stationary Markov chains 
X(n) with p, > 0,7 = 1,---, m, such that Y(n) = f(X(n)) is Markovian for 
any many-one transformation f. 

TueoreM 3. Let X(n) be a stationary Markov chain with p; > 07 = 1, +--+ , m. 
f(X(n)) ts Markovian for every many-one transformation f if and only if the transi- 
tion probability matrix P of X(n) is of the form 


(6) P=al+(1—a)U, 


where U is a matrix with identical rows and a is a real number. 

It is interesting to note that when one goes to the case of a decent continuous 
parameter Markov chain with a finite number of states, the analogue of (3) 
becomes almost necessary for Y(t) to be Markovian, whatever the initial prob- 
ability distribution w of X(t). 

TureoreM 4. Let X(t),0 S t < «©, bea Markov chain with a finite number of 
states i = 1, +--+ , mand stationary transition probability function 


P(t) = (pi;(t)) 
pit) = PIX(t + 7) = 7 | X(r) = 7] 
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continuous in t. Assume that 


lim P(t) = I. 
t 10 


Clearly 
P(t)P(s) = P(t + 8), t,s > 0. 
Let the initial probability distribution of X(t) be w, wi > 0,7 = 1,--- , m. Then 
Y(t) = f(X(t)) is Markovian, whatever the initial distribution w of X(t), if and 
only if for each B = 1, ---, 7 separately either 
(i) pi,s,(t) = 0 forall czS, or 
(ii) pi,s,(t) = Cs,.s,() forevery te Sg andall y = 1,:--,7. 
Part of the interest in the proofs of Theorems 1 and 4 lies in the fact that 
they show that if the collapsed processes in these cases satisfy the Chapman- 
Kolmogorov equations, they are Markovian. 
Condition (3) can be reworded in the case of a Markov process X(t), 0 S 
t < , with stationary transition probabilities and values in an abstract space. 
Let 2 be a space of points z and B(Q) a Borel field on @. Further let the sets 
(x) be elements of B(Q). Consider a function 


PQ: x, A), A ¢ B(Q) 
satisfying 
(i) P(t; x, A) is a Baire function of x for fixed t, A; 


(ii) P(t; x, A) is a probability measure in A ¢ B(Q) for fixed ¢, 2; 
(iii) P(t; 2, A) satisfies the Chapman-Kolmogoroy equation 


P(t+7;2, A) = [ P(t; y, A)P(r; z, dy), t,7 > 0. 
Q 


Let X(t) be a Markov chain with P(t; 2, A) as its transition probability func- 
tion. Let f be a function from 2 onto another space of points 2’. The function 
f induces a Borel field of sets B(Q’) = f(B(Q)) on 2’. This consists of sets of the 
form fA = (ye |y = f(x), xe A), A € B(Q). Now consider the inverse images 
of sets in f(B(Q)). The class of sets of this form we call f'f(B(Q)) and it is a 
subBorel field of B(Q) consisting of sets of the form 


{zeQ|ze =f f(x), re A}, A ¢ BQ). 
The analogue of condition (3) is simply that 
(8) P(t;z,A), Aef 'f(BQ)) 


be a Baire function of « with respect to f-'f(B(Q)) for fixed t, A. 
CoroLuary 3. Y(t) = f(X(t)) is a Markov process, whatever the initial prob- 
ability distribution of X(t), if condition (8) is satisfied. Condition (8) is discussed 
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in a paper of B. Rankin [4] as a sufficient condition for a collapsed Markovian 
process to be Markovian. 

2. The stationary case. Let the assumptions of Theorem 1 ‘«  .tisfied. The 
matrix of n-step transition probabilities of the process Y(n) is .. the form 
(9) Q™” = AP"B = (qs ) = (P[X(t + n) € S3| X(t) € S,)), 


where A, B are r X mand m X r matrices respectively. The elements of B are 
of the form 


(] if 1 € S; . 
iin 3 
0 otherwise; 
while 


(10) A = (B’DB)'B'D, 


where D is the diagonal matrix introduced above. If the new process is Marko- 
vian, the Chapman-Kolmogonov equation must be satisfied by the Q'”’, that is, 


(11) Q’” = AP"B = (Q" |" = (APB)’, n = 2, 3, 
This condition can be reworded in an equivalent form 

(12) AP"BAPB = AP""'B, n = 1, 2, 3, 
Note that 

(13) BAPB = PB 


implies that (12) is satisfied. Condition (13) is just condition (3) expressed in 
matrix form when the assumptions of Theorem 1 are satisfied. We first verify 
that (3) implies that Y(n) is Markovian. (To facilitate printing we sometimes 
write a(z) in place of a; .) Clearly 


II 


Do Ly Pin Dinis** * Din—ain 


P[{Y(O) Butts °** 3 Y(n) € Sany] 


j=0 + 5€8a(j) 
a to Pi) C 54 (0.801) cade Cas tant) Meter 
t€Saq (0) 


and it is easily seen that 
Cs,.s, = P[Y(n + 1) € Ss| Y(n) € S,]. 


The sufficiency of condition (3) is thus verified. Note that the sufficiency argu- 
ment given above holds for the case of any initial distribution w and without 
the condition of reversibility. We thus have Corollary 1. 

Let us now consider the necessity of condition (3) when X(n) is reversible. 
If Y(n) is Markovian the Chapman-Kolmogorov equations are satisfied by the 
Q‘” and we must have 


Q” - (Q?P 
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or 
APU — BA)PB = 0. 
But this implies that 
B/DP(I — BA)PB = 0. 
Because of reversibility, this can be written as 
B’P'DU — BA)PB = 0. 
Now D(J — BA) is positive definite so that 
DUT — BA) R’R 
for some m X m matrix R. Thus 
(RPB)'(RPB) 0 
and 
RPB = 0. 
But then 
R’RPB = DU — BA)PB = 0 
and hence 


(J — BA)PB = 0. 


It is worth while noting that the problems we consider are related to issues of 
aggregation and consolidation in multisector models of mathematical economics 
(see [5]). There one has a stochastic matrix P and an invariant vector 

p, pP = p. 
One asks for the types of aggregation under which the aggregated invariant 
vector is an invariant vector of the aggregated matrix. The aggregated matrix 
Q = APB where B is defined as before and A = (B’D,B) 'B’D, . Here D, is 
the diagonal matrix with its 7th diagonal element v;. The aggregation is de- 
termined by the sets of states S; and the vector v = (v,;). The aggregated vector 
is pB. The question is then for what aggregation schemes the relation 
pBQ = pB(B'D,B) 'B'D,PB = pB 

is valid. Conditions (3) and (6) turn out to be crucial in some of the results ob- 
tained in [5]. 

3. Any initial distribution. Let the assumptions of Theorem 2 be satisfied. 
We first show that (4) is sufficient. It is enough to show that 
P[X(n) = 7, X(n + 1) eS,---,Xnm+heS8,Xn1+h4+1) =] 

P{X(n) = P{X(n + 1) e S| X(n) = 7 
+ PiX(n+h)eS|Xn+h—- eS 


PiX(n+h+1) =j7|X(n+ hj) € S|] 
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for any 7 eS and any 7, since then Y(n) is clearly Markovian. Note that (4) 
implies that 


(14) , Pei Pi,s = Pes Cs 
leS 


for all k. By making use of (4) and (14) the following relation is obtained 


P|X(n +h+ 1) = j,X(n+ h) ec S,---,X(n+ 1) € 8| X(n) = FJ 


h 
Bi bs a Pit; Pi, ,ig* * * Pins .in Pin. 


kewl ipeS 


= pi.s(Cs)*'C;. 
But 
C; = P[X(n + 1) = j| X(n) € SI, J2@8, 
and 


Cs = P[(X(n+ 1) ¢S8|X(n) € S}. 


An Argument paralleling the one given above indicates that (4’) implies that 
Y(n) is Markovian so that we have Corollary 2. Y(n) is obviously Markovian 
if (5) is satisfied. 

Now consider the necessity of (4). Since Y(n) is Markovian whatever the 
initial distribution w of X(n), the transition probabilities of Y(n) satisfy the 
Chapman-Kolmogorov equation. It may be that pis = 0 for all 7. Then (4) is 
obviously satisfied. Suppose now that there is an 7 such that pis # 0. The Chap- 
man-Kolmogorov equation then tells us that 


Z i Wk Pri Piu 


les k 


dis = zz ait Diu 
I > we Pas — Pit Pi 
k 
for all 7, we S. If k is such that ms # 0 then 
(15) Pi.s Do Pkt Pu = Pk,s » Pil Piu 


as is seen by letting w,.— 1 and w, — 0,1 # k. And if px,.s = 0 (15) is obvi- 
ously satisfied. Thus (15) holds for all k and all i zg S. If there is an i zg S such 
that pis * 0 (15) is satisfied for all k and ¢. But this implies relation (4). There 
is still the possibility that p;,s; = O for all ¢ zg S, namely condition (5). 

In the context of Theorem 2 condition (3) implies that condition (4) is satis- 
fied. However, the converse is not true. Consider the transition probability 
matrix 


0 


On Ae Oi 


P 


ll 


_ 
— 


-- ol eS we Dim 


SO Or Br Ol we 
tol 
a Be wl OD 


CO # BH OF wr 
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Collapse the states 1, 2, 3 into a set S and ieave the states 4, 5 alone. Note 
that (3) is not satisfied. But (4) is satis ied since 


Ds Pat Pi 


Ves oe ll 


Pi.s 
for all ug S and all k. 


4. Any function f. The answer obtained to the question posed in Theorein 3 
is the same as the answer obtained in a similar problem posed by 
Bush, Mosteller and others [1]. The structure of interest in Bush and Mosteller’s 
problem is not Markovian. Note that in our case we ask that f(X(n)) have the 
same structure (a Markovian structure) as X(n) for any f and a specific initial 
probability vector, a left invariant vector p of ?. Bush and Mosteller ask that 
f(X(n)) have the same structure as X(n) for any f and any initial probability 
vector w. 

Let us now prove Theorem 3. The condition imposed on the process will not 
be used in full strength. Just consider a consolidation in which two states j, k 
are consolidated into a set S and all other states are left the same. Let 7, | be 
any indices distinct from 7, k. Since the consolidated process is Markovian, its 
transition probabilities satisfy the Chapman-Kolmogorov equation and hence 


(16) pP =O piu par = DL piu par + (pis + pin) PE PAT Pe Por 
u=l ues Pi + Pe 

Equation (16) can be reduced to the following convenient form 

(17) (pispx — PikPj)(Pjr — Per) = O. 

Further, (17) implies that 

(18) \(pjpij + DePes)Pe — (PsiPin + PePex)Ps\(Pit — Per) = O. 


First consider the case in which for all ¢ pipe = pup; for all 7, k ¥ 7. But 
then 


pi; = (1 — dA)p;, tJ, 
ico 
l — 2; 


so that P is of the form 
P=A+(I—A)U, 
where A is a diagonal matrix with diagonal elements A; and U is a matrix 
with identical rows (p, , +--+ , pn). If 
(19) (Pipi + PePrs)Pe = (PiPix H+ PePux)P, 


for some pair of indices 7, k it follows that A; = Ax. If (19) does not hold for 
the pair 7, k, (18) implies that pj, = px: for all l ¥ 7, k. But then A; = Ax . Thus 
it follows that in this ease \; = A» = oe Res 
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Now on the contrary assume there is a row 7 for which pijp, = pip; does 
not hold for all 7, k # ¢. Given any j # 7 consider all k for which we can find 
a sequence j; , +++ , ja such that 

PiiPiy = PiiPi 5 PiiiPis = PiiPiys *** » PiiaPk = PikPia + 
There is a maximal set of such indices k (including j of course). There are at 
least two such sets. The collection of all such maximal sets are disjoint. Given 
any j in one such maximal set and any 7’ in another we must have 
(20) Pit = pyr 
for all l ¥ j, 7’ and 
») = 
(21) Pi + Dis — Piri — Pri = OY. 
lor convenience let us assume i = 1. Keeping (20) and (21) in mind, it is clear 
that for any fixed 7 # 1 the ~,;’s must be equal for all k ¥ 1, j. Call this com- 
mon value u;. Thus all rows except possibly for the first must be of the form 
Pej = Oni + U;. 
There are now two possibilities. Either pip. = pup; for all i ¥ 1 and all 
j,k At 


or this is not the case. If not we must have p;; = \6;; + u; for all 7. Since p is 
an invariant vector u; = (1 — A)p;. On the other hand if pix, — pup; = 0 
for alld # 1 andj, k # i then u; = (1 — A)p;. The elements of the first row 
are as yet unknown. But again making use of the fact that p is a stationary 
distribution we see that pi; = Ad:; + (1 — A)p;. 


5. Finite state space and continuous time. The proof of the sufficiency of 
condition (7) in the case of Theorem 4 parallels the proof of Corollary 1. 

We now show that (7) is necessary. A transition probability matrix-valued 
function P(t) satisfying the regularity conditions posed in the assumptions in 
Theorem 4 is of the form (see [2]) 

P(t) = exp (Gt), 


where G = (g,;) is such that 
Jij = 0, t ~*~ ds; 
Z a = “eK 
j=l 
ini 


Let w = (w,), w; > O be the initial distribution of X(t). A necessary condition 
that the collapsed process be Markovian for an initial vector can be written 
down conveniently in matrix notation. As before, let 


Q? = (B'D,B)"B'D.P()B 
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denote the ¢-step transition probability matrix (from time zero to time ¢) for 
the collapsed process Y(t) when the initial probability distribution vector of 
the original process X(t) is w. If the collapsed process Y(t) is Markovian Q./ 
must satisfy the Chapman-Kolmogorov equation and thus 


(22) Qe Qerw = Q*”, t,r > 0, 
for all w, w; > 0. It is clear that the w,’s only have to satisfy w; > 0 and that 
the condition zz w; = 1 needn’t be imposed. On differentiating relationship 
(22) with respect to 7 at r = O we obtain 

(23) Q(B’ DorwB)'B'D.ewGB = (B'D,B)"B'D,P(OGB. 


Let us now differentiate (23) with respect to tat t = 0. We then have 


B'D,.GB(B'D,.B) 'B'D,.GB — (B'D,B)'B'D,¢gBB'D.GB + B'D.gGB 


= B'D,GB. 
This can be written more conveniently as 
(24) B'D.G — Gy¢\[B(B'D,.B) '(B'D,) — I|GB = 0. 
Let 
Ws, = >. wi, 
in. = x Ji; - 


Condition (24) can be written down elementwise as 


1 
DL De wi gis, Ws, Dy Wi Jisy — Dy LL wi Ge Or.s3 
teS, ¥ eS, ieSg k 


(25) 
— DVwigis, ws, Do wigiss + wi De git Ge.s = 0. 
i tes, i keSa 
If we set w; = uh, ie Sq, in (25) and then let h | 0, the following relation is 


obtained since the first two terms drop out 


ae > Wi Jisq us, » Ui GJi.s, > > Ww; Zz. Jik Jk.sg = 0. 


itSa “a ttSa kKtSaq 


But this is valid if and only if 
9i,s, Us, » Ui Gis, = 2d Jik Jk.ss 
for all 7 g S, . Further, since this holds for all u, , 
(26) 0.8, 95.85 = 2s Dik Or.8s 
for all i g S, and all j ¢ S,. There are only two alternatives that arise. 


Ji.s, = 0 
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for all i zg S, relationship (26) is obviously satisfied (we then say that S, satis- 
fies (i)). Otherwise g;.5, # 0 for some 7 ¢ S, in which case g;,s, for each 6 is a 
constant for all 7 ¢ S, , that is, 


(27) Gis, = Ks,,s5 

for allje S.,8 = 1,---,r (we then say that S, satisfies (ii)). The matrix G 
is said to satisfy (7) if for each @ separately S, satisfies either (i) or (ii). Note 
that if G satisfies (7) the nth power of G, G" = (gi}’), satisfies (7) in a con- 


sistent manner, that is, S, satisfies (i) for G" if and only if S, satisfies (i) for G. 
Since 


P(t) = exp(Gd = > Gt'/k! 
k= 


P(t) satisfies (7). It should be noted that our proof has shown that the condi- 
tion that the Chapman-Kolmogorov equation be satisfied by the collapsed 
process is enough to imply that the new process be Markovian. P. Levy [3] has 
shown that this is generally not the case. 


6. Abstract state space. Consider «a Markov process X(t) with initial prob- 
ability distribution 


P(X(0) ¢ A] = P(A), A ¢ BQ) 
and transition probability function 
Pit; 2, A) 


satisfying the assumptions of Corollary 3. Then Y(t) = f(X(t)) is a Markovian 
process with initial distribution 


P{Y(O) ¢ A’] = P[X(O) ef “(A’)] = Q(A’) 
A’ ¢ f(B(Q)), and transition probability function 
Q(t; y, A’) = P[Y(t + 1) € A’ | Y(t) = yl} 
= P[X(t + r) ef “(A’)| X(r) ef '(y)) 
= P(t;2,f'(A’)), y 69’, A’ e f(BQ)), 


where zx is such that y = f(x). This follows immediately from condition (8). 

It is interesting to note that one can generate new Markovian processes from 
old ones by setting up f so that it is consistent with the symmetries of the transi- 
tion probability mechanism of the old process. Consider X(t) Brownian motion 


on the line. Here the transition probability density is 
P (t; 2, y) = (2xt)*” exp( — >i (x — »), t> 0. 


If we set 


f(x) = x — alx/al, a> 0, 











1122 Cc. J. BURKE AND M. ROSENBLATTY 


where [z] is the greatest integer less than or equal to xz, the new Markovian 
process Y(t) = f(X(é)) is Brownian motion on the circle. If 

f(x) = 2 
on all points of the form 2ka + z,0 S z<a,k = 0, +1,--- , Y(t) is Brownian 
motion on a line segment of length a with reflecting barriers at the endpoints. 


As a further example consider starting out with two-dimensional Brownian 
motion (X,(t), X2(t)), that is, the transition probability density is 


p(t; (a1, 22), (yr, y2)) = (2xrt)* exp (- i | (x = n) + (:. as ») |), t>0. 


If 
f(x ’ 2) = (uy, ’ Us) 


for all points (x; , x2) of the form (uw, + ja, w+ ka)O Sm, um < ayj,k= 
0, +1, --- (¥i(t), Y2(t)) is Brownian motion on a torus. If 


f(a , %2) = (uy, Ue) 


for all points of the form (u; + ja, (2k + 7) a+ wm) 0 Sm, wm <ajk = 
0, +1, --- (¥i(t), ¥2(t)) is Brownian motion on a Moebius strip with reflecting 
barriers on the edges of the strip. 
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ASYMPTOTIC DISTRIBUTIONS OF “PSI-SQUARED” GOODNESS 
OF FIT CRITERIA FOR m-TH ORDER MARKOV CHAINS! 
By Leo A, GoopMAN 
University of Chicago 
1. Introduction and Summary. Let {X,, X2,---, Xw} be an observed se- 
quence from a stochastic process, where X,; can take any one of s values 1, 2, 
- , 8. Let f, be the frequency of the m-tuple u = (wu, U2, +++ , Um) in the se- 
quence. Let H,, be the composite hypothesis that the process is a Markov chain 
of order n. Let H, be any simple hypothesis belonging to H,. Let H% be the 
maximum likelihood H,. Let the expected value of f, in a new sequence of 
length N given H, be fy, , and given H® be f.,, . Let 


Ym = 2. (fi iJ; Same 


Van = 2, (fe — fan) /fan; 


*2 


Wniin =(. 


Good had proposed in [7] the following two conjectures: (a) that the asymptotic 
distribution (NV — «) of Warn , when /7,, is true, is 


m—n—1 
* K, ) (x /n) . 
Awl 
where * denotes convolution, g(A) = (s — 1)°s"", and K,(x) is the x’ -dis- 


tribution with 7 degrees of freedom; (b) that the asymptotic distribution of 
vin, When H, is true, is 

m—1 

s Koay (2/d) Ky (2/m) , 
mathematically independent of n. Conjectures (a) and (b) were proved by Bil- 
lingsley [2] for the special case n = 0. For the special case n = —1 (by conven- 
tion, H’, is the hypothesis of equiprobable or perfect randomness (see [7})), 
Conjecture (b) was proved by Good [5] when s is prime. In the present paper, 
Conjecture (a) will be proved for the general case n = —1; conjecture (b) will be 
shown to be incorrect for n > 0, although a modified version of (b) will be proved 
for n = —1. A third conjecture by Good [6] will also be proved here. It was 
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assumed in these earlier papers, and it will be assumed here, that all transition 
probabilities in the Markov chain are positive; the results can be modified ac- 
cordingly when some of these probabilities are zero (see [1] and [10]). 

Let Munn = —2 log An.m—1, Where Ax. m—1 is the ratio of the maximum likeli- 
hood given H', to that given Hi. (see [6]). For m = n + 2, the statistic =, 
is asymptotically equivalent, when H’, is true, to the likelihood ratio statistic 
Mann. For m > n+ 2, Warn is asymptotically equivalent, when H’, is true, to 

m—n 


Rat AMmsi-r,m1-», While M,,,, is asymptotically equivalent to 


m—n—l 


2 Matt 

Awl 
(see [6], [10]). Thus, YA”, corresponds asymptotically to a weighted sum of the 
likelihood ratio statisties Myson, Manssngi, +++, Minim 
m—n—1l,m—n-— 2,---, 1, respectively, while M,,,, weights these statis- 
tics equally (see [13] and reference to [13] in Section 4 herein). 

Let Linn = —2 log un.m—1, Where p,.»-1 18S the ratio of the likelihood given 
H,, to the maximum likelihood given Hi,_,. For m — 1 n = 0, the statistic 
Sn is asymptotically equivalent, when H, is true, to L,,,,. Form — 1 > n 
0, vin.n is asymptotically equivalent, when H,, is true, to 


2, With the weights 


m—1 


DB AM i-ra,m—1— + MEarsi.2; 


A=l 


while Ln, is asymptotically equivalent to DS Xo) Mayiavmaar + Lagan. 
For n > 0, the relation between y;,, and the likelihood ratio statisties Lin.n 
and M,,., is not so straightforward. However, a modification Winn of vin n (see 
Section 6 herein) is asymptotically equivalent, when H, is true, to Lm, for 
m=n-+ 1, and to ia *)M asi-a.mi-a + (m — n)Lavin for m>n +1; 
while the likelihood ratio statistic Ln.» is asymptotically equivalent to 


m—n—1 


2 Masxiunee > Basie: 
Axl 
In [10], the m-tuple u was “split”? into an (m — n — 1)-tuple, an n-tuple, 


nm 66 


ee . - f m—n-—l 
and a 1-tuple; thus obtaining s" “contingency tables” (n 2 0) each s x-« 


(see [10]). The statistic M,,,, can be seen to be asymptotically equivalent to the 
sum of the “likelihood ratio statistics” (for testing “independence” in each table) 
for the s” tables, and the asymptotic distribution, when H’, is true, of M,,,, will 
be x’ with s"(s” "" — 1)(s — 1) = s™ — 8” ' — 8"*' + 8" degrees of free- 


dom. It is also possible to “split” the m-tuple u into an (m — n — 1 — r)-tuple, 
an n-tuple, and a (1 + r)-tuple (0 < r S m — n — 2); thus obtaining s” 
“contingency tables,” each s””'" x s'*” (see [10]). The sum ,M,,,, of 


the likelihood ratio (or any equivalent goodness of fit) statistics for the s” tables 
. . a. 
will have an asymptotic mean value, when //, is true, of 


s"(s™ a<ga¢ = 1)(s'*’ ote 1) aan _ es e r—) = gerne 4. 3" 
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but the asv:aptotie distribution will not be x° unless r = 0 or m — n — 2. 


It can be seen, using the methods developed in the present paper, that the statis- 
tic ,.M,,,, will be asymptotically equivalent, when H’, is true, to 


m—n—1 
a h(A)M mgt—r,m—1-r 5 
hm 
where 
(r for O<ASv 
h(a) =<sv for vSASm—n-—-v 
\(m—-n—A) for m—-n—-v SrA Sgum-—an — il, 
and v = min |r +1, m—n—r-— 1). Thus, the asymptotic distribution 


(N — ~) of ,M,,., (or the corresponding asymptotically equivalent goodness 
. . . ae. ° 
of fit statistics), when H, is true, is 


m—n—1 


* Kgoolz, (h(A))] . 

This result generalizes the earlier published results concerning the asymptotic 
distribution of the likelihood ratio statistic M,,,, (or the corresponding asymp- 
totically equivalent goodness of fit statistics) for testing the null hypothesis 
H'’, within Hi,., since ,M,., for r = 0 or m — n — 2 is asymptotically equiv- 
alent to M,,,, (see {6}, [10]). A proof of this result will not be given since the 
method of proof is quite similar to that presented here for the asymptotic dis- 
tribution of Wan ; 


2. The Case n = —1. Let us first consider the case of equiprobable or perfect 
randomness (n = —1). We have that H., = 7, = _, , and yn = a 
Thus, Conjectures (a) and (b) must be in agreement whenn = —1.Forn = —1, 


Conjecture (a) states that the asymptotic distribution of ¥2?_, is 


* Kgay(2/d) , 


Am! 


while (b) states that the asymptotic distribution of Vn. 1 Is 


m—1 


* Ko(2/d)*K,_1(2/m) . 


Thus, we must define Kg.m)(a/m) as K,_1(a/m); 1.€., K—1)2.-1(x/m) as K,_,(2/m). 
It should also be mentioned that ¥4,, and Warn are defined only for m 2 n + 1 
0 


(with m 2 1, for n = —1), and the symbol «*« K is to be understood as the 
Dail 


atomic distribution that has the total probability 1 at the value z = 0. Since 
H_, is a special case of Ho, results for n = —1 will follow directly from results 
forn = 0. 
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3. The Case n = 0. In the present paper, it will be convenient deal to with 
circular sequences, so that (for m = 2) }°; fi; = Do; fy. = fi. A more general 
statement (for m = 2) can be seen to hold for circular sequences (see [6|). A 
method of modifying results obtained for circular sequences so that they can 
be applied to linear sequences has been given in [9], and this method can be 
used to indicate that results analogous to those presented in the present paper 
will hold for linear sequences. The reader is cautioned that formulas for cir- 
cular sequences can not be applied directly to linear sequences (see [9] and 
Corrigenda to [6]). It will also be convenient herein to replace Yn., and Wan 
by their asymptotically equivalent forms (when H,, is true) 


Winn 2a fy log (fu / fu.n)s 


and Warn & 2>-ufa log (fu / fen), respectively. 

Let us first consider Conjecture (a) whenn = 0. For m = 1, this conjecture is 
obviously correct. For m = 2, this conjecture was first stated in [8] and was 
proved by Dawson and Good [4] and by Goodman [10]. The analogous result 
for the asymptotically equivalent form of ¥25 was proved by Hoel [11]. 

For m = 3, Conjecture (a) states that 


30 = Dy (Si — SiS Se/N'Y/ Si Si fe/N?) 


ijk 
ie 9.2 
—_ Xe(e—1)2 FH 2X(e—-1)2 » 


where the symbols x; denote independent random variables each having a chi- 
square distribution with 7 degrees of freedom. (The fifjfi;/N? used above is not 
the exact expected value, but is an asymptotic approximation; such asymptotic 
approximations for expected values will be used throughout.) We have that 


v5.0 ~2 > Sis log [fiin/ Si f; fr/N’)) 


I 


2 i Siie log [frin/ (fs fix/N)) + 2 z. Six log [fin/ CS; f/N)| ° 

ijk jk 
The second term in the sum is asymptotically x/,-1)2, by the result for m = 2. 
The first term in the sum can be split into two parts, thus obtaining 


2 Do fun log (fie/(fisSine/f] + 2D fis log (fis / Chi fi/N)) - 


By the results in [10], for the test of H; within H2, the asymptotic distribution 
of the first part is x3q—12 ; the asymptotic distribution of the second part is 
Xie? (by the results for m = 2). The first part is asymptotically independent of 
the second part. This can be seen from the fact that their sum has the same 
asymptotic behavior, under //, as the standard likelihood ratio statistic used in 
testing independence in an s’ X s contingency table (see the test of Ho within 
H; in [10]), and the two parts in the sum are obtained in the same manner as 
the partitioning of the likelihood ratio for the contingency table into two inde- 
pendent parts (see p. 429 in [3] and the articles referred to therein; rigorous 








ASYMPTOTIC DISTRIBUTIONS 1127 


proofs of some of the published results concerning partitioning of contingency 
tables are given in [12]*). The first part is obtained by separating the s° rows 
into s sets of s rows, thus obtaining s contingency tables, each s X s, and using 
the combined likelihood ratio for the s tables to obtain asymptotically a “ete~2)? 
distribution (which leads to a test of H} within H} in [10]); the second part is 
obtained by combining the s rows in each set to obtain an s X s contingency 
table, and using the likelihood ratio for this table to obtain asymptotically a 
X/r-12 distribution (which leads to a test of H4 within H; in [10]). Since the sec- 
ond part of the first term in Via is equal to the second term in Waa , their sum 
is asymptotically 2xi.-1) . Thus we have proved that va3 © wine + 2xco-n? - 


For m = 4, Conjecture (a) states that 
Vio = >, (Siser — SiS SeSi/NY/ Si Si fe fi/N”) 


~~ xs% —1)2 + 2X a(2—1)2 + Sniewat ° 
We have that 


vio 22 De fain log PAGE f; Si f,/N’)) 
=2 De fase log [fiins/ (Si finr/N)) + 2 a Sine log (fjxr/ (Ss Se fr/N°)) . 


. ‘ ‘ *2 . . 2 2 

The second term in the sum is $3 and is asymptotically xo¢—1»2 + 2x(.-12 > 
by Conjecture (a) for m = 3. The first term can be split into two parts, thus ob- 
taining 


2 De Sain log ( fiser/(Sise Sier/fiu)] + 2 y finn log (fise/ (fe Six/N)) - 


By the results in [10] for the test of H2 within H3, the first part is asymptotically 
1 0is.an® ; the second part is asymptotically 1 eet (by the results form = 3). 
The two parts are asymptotically independent. This follows from the fact that 
their sum has the same asymptotic behavior, under Ho, as the standard likeli- 
hood ratio statistic used in testing independence in an s° X s contingency table 
(see the test of Ho within Hj in [10]), and the two parts in the sum are obtained 
in the same manner as the partitioning of the likelihood ratio for the contingency 
table into two independent parts. The first part is obtained by separating the 
s’ rows into s’ sets of s rows, thus obtaining s’ contingency tables, each s X s, 
and using the combined likelihood ratio for the s’ tables to obtain x22;,-y: 
(which leads to a test of H2 within Hj in [10]); the second part is obtained by 
combining the s rows in each set to obtain an s’ X s contingency table, and using 
the likelihood ratio for this table we get x7.2-1)~—1) (which leads to a test of Hy 
within H; in [10]). Since the second part of the first term in ¥i3 can be written 
AS Xie2—1)(e-1) = Xeu-1? + Xie? (see the results for m = 3), and since the sec- 
ond term in Yio is W390 & Xie»? + 2x7.-»2 (where the xXeu~-y2 and the x%.—»2 


? I am indebted to T. W. Anderson for bringing [12] to my attention. 
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expressions are identical with those appearing in the second part of the first 
term), their sum is asymptotically 2x4 ¢.-y2 + 3x(e—»2 . We have thus proved that 


*2 2 9.2 9.2 
V4.0 ~~ Xs2(s—1)2 + 2Xele 2 + 3X(s 1)2. 


For m = 5, 6, --- , the same method of proof applies for Conjecture (a) when 
n = Q; it is easy to see that Wao is asymptotically equivalent, under H5, to a 
weighted sum of asymptotically independent likelihood ratio statistics. 

Let us now consider Conjecture (b) when n = 0. We have 


Wan © 2D- fy log (fa/fu.n) 
= 2 {Do fu log (fa/fa n) + Do Su log (fa n/fa.n)} 


~ nin +2 Do fa log (fr n/fuin): 
ror m = 1, ¢ o = fy, and the second term is 2 fi log (f:/Npi), which is 
asymptotically x;-. by the standard statistical theory for goodness of fit tests. 
For m = 2, the second term is 


2 Do fa log (SF 0/fuo) = 2.20 fis log (fi 5i/N)/Np. pil 
= 4 2 log (f:/Np,), 


which is asymptotically 2x(,-.) . The first term 20 is asymptotically independent 
of the second. This follows from the fact that the sum of yo and 2>> f; log 
(f:/Np;) is the likelihood ratio obtained in testing the null hypothesis Ho that 
the transition probabilities for the Markov chain are p;; = p; = p; (specified) 
within the hypothesis Hi (ie., 2 oii his log (fi; /fip;) © xtep2 + xb 
Xis-1) (see [1])), and the two terms in the sum are obtained by partitioning the 
likelihood ratio into two independent parts (the independence of the two parts 
follows directly from an examination of the asymptotic behavior of the f,; (see, 
e.g., [9])). The first part is asymptotically x{,-»2 and tests the null hypothesis 
H that p:; = p; (unspecified) within H;; the second part is asymptotically 
Xie_1) and tests the null hypothesis Hp that p; = p} (specified) within H. Thus, 
V2.0 Xie? + 2xi-» - 
For m = 3 the second term is 


2 D0 fu log( ft .0/fu.0) 


i 


2 D> fis log ((fi fife/N°)/N ps pi pul 


ijk 


= 6 > filog (fi/Np,), 


‘ ‘ “ | mn . acs 
which is asymptotically 3x(,—1) . The first term is independent of the second, by a 
similar argument to that presented for m = 2. Thus, 


9 9 2 2 
2 i < 9.2 5. 2 
¥3.,0 ~ Xs(s—1)2 + 2X(s 1)2 + OX(s—-1) + 
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For m = 4, 5, 6, --- , the same method of proof applies for Conjecture (b) when 
n = 0. 

We have thus given an altogether different method for proving the results 
obtained in [2] for n = 0; the results in [2] were based on the theory of finite- 
dimensional vector spaces. Since H“, is a special case of Ho, the results given in 
the present section also prove that Conjectures (a) and (b), when properly inter- 
preted, are true for nm = —1, which generalizes the result proved in [5] for n = 
—1 and s prime. The different method presented in the present paper may fur- 
ther the understanding of the results in [2] and [5]. 


4. The Case n = 1. Let us now consider Conjecture (a) when n = 1. For 
m = 2, the conjecture is obviously true. For m = 3, we have that 


vai 2 De Sain log (fi/ (Sis Sa/f) © xde-y2, 


by the results in [10] for the test of H} within H). For m = 4, we have that 


var 2 a S isi log { Sisme/l fis Sin Sia] (S; fx))} 
= 2 a Fist log (Fisna/ (Soin S ini/Six)) + 2 De Susu log Sisa/ (Sis Six/fi) 
+2 2. Sint log (fint/ (fin Sucr/fe)). 


By the results in [10] for the test of H; within Hj, the first term in the sum is 
asymptotically X:2¢¢-»2, and the second term is asymptotically Xsto~1)* (see 
m = 3). The first term is asymptotically independent of the second. This follows 
from the fact that their sum can be regarded as the combined likelihood ratio 
used in testing independence in s contingency tables, each s’ X s (see the test 
of H; within H; in [10}), and the two terms in the sum are obtained by partition- 
ing the likelihood ratio for each of the s tables into two independent parts. For 
each of the s tables, the first part is obtained by separating the s’ rows into s 
sets of s rows, thus obtaining s new tables, each s & s, and using the combined 
likelihood ratio for the total of s° tables to obtain x%2¢—:)2 (which is a test of 
H}, within Hj in [10]); the second part, for each of the original s tables, is obtained 
by combining the s rows in each new table to obtain an s X s table, and using 
the likelihood ratio for this table (there are s such tables) we get x3¢—1)2 (which 
is a test of Hi within H; in [10]). The third term in the sum is asymptotically 
X: 12-12 (see m = 3), and it is equal to the second term in the sum. Thus we have 
vin ~ Xa%-yt + 2x50? - 


lor m = 5, 6,---, the same method of proof applies for Conjecture (a) 
when n = 1; ¥a1 is asymptotically equivalent to a weighted sum of asymptoti- 
cally independent likelihood ratio statistics, under Hi. 

Let us now consider Conjecture (b) when n = 1. We have that 


Vina vai + 2 Lh log (fe a/fead- 
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; a al : #2 , 
For m = 2, fua = fu ; thus, the first term Wn, = 0, and the second term is 


2 oii fis log (fi; / Npipis), where the p; are the stationary probabilities for the 
first order Markov chain with constant transition matrix P = [p,;]. Conjecture 
(b) states that yoi = Xie ny? + 2x, 1». We could write 


v1 ~2 2d fis log (fis/ fi fi/N)) + 2 2d log ((fi fi/N)/(Npi pis)). 


The first is not asymptotically x{,2, except when n = 0; and the second 
term is not asymptotically 2x{,-1) , except when n = 0. It is easy to see that 
Conjecture (b) will not hold true for n = 1, nor for n > 1. 

Conjecture (b) will now be modified and this modified version will be proved 
true. This modification, for the special case n = 1, was first mentioned to the 
author by P. Billingsley in a private communication. In this communication, 
he mentioned that he had also obtained independently a proof of Conjecture (a), 
for the case n = 1, by very different methods from those used in the present 
paper, and that his results for Conjecture (a) and the modified Conjecture (b), 
when n = 1, could be extended to the case when n > 1, although the detailed 
asymptotic distributions were not given in the more general case [13]. 

Let Woot = Doulha _— far % , where ‘. is the expected value of f, in a 
new sequence of length N given H, and f,,,; i.e., faa = fu,MW ror Pusu; ,, - Then 


Vn © part 2 > fy log (fo1/fia). 


, . 8. . e 
When m = 2, the first term wri in the sum is zero and the second term is 
2D fis log (fis/fi pss), 
J 


which is asymptotically xi: (see [1]). Thus, th maptotic distribution of 
a. 2 
Y21 IS Xs(s—1) - ‘ 
When m = 3, the first term 3; is asymptotically xj..-»2, and the second 
term is 


2 D, Suse log (fis Six Tf fi Pis Pir) =4 D Sis log (fui/fi Pis), 


which is asymptotically 2x5 1s 1. The first term leads toa test of H, within H3, 
and the second term leads to a test of H; within H;; it ean be seen that the 
two terms are asymptotically independent under H,. Thus, for m = 3, the 
asymptotic distribution of . when //, is true, is 


m—2 


* Koa)(a/d)*Ku.—y[2/(m — 1). 

A= 

This result can be proved for m = 3 by the same method as given here for m 
3. Thus, a modified version of Conjecture (b) holds true for n L. 


5. The Case n = 2. Let us now consider Conjecture (a) when n = 2. For 
m = 3, the conjecture is obviously true. For m = 4, we have 


*2 2 
W42 ~ > Sis log (Sisna/ (Sise Sina fix) | S X02(s-1)2 , 
ajkl 
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by the results in [10]. For m = 5, we have 


Yo a 2 2, Siskim log [fistam/ (Saiz Site Srcim/S ix Sur)) 
= 2 2 Srikim log [fiserm/ (Sister Sinrm/fier)| + 2 a Siser log (fisnr/ (Lise Sinr/fix)] 


+2 > Sizrm log, (f seim/ (Sint Setm/fur)). 


jkim 


By the results in [10], the first term in the sum is asymptotically x3s;_.2, and 
the second term is asymptotically x21): (see m = 4). The first term is asymp- 
totically independent of the second; this follows by an argument similar to 
those appearing earlier here. The third term in the sum is asymptotically 


2 
Xe2(e—1)2 


(see m = 4), and it is equal to the second term in the sum. Thus, we have 
Vee > X53(4 ps + 2K 220 1)2. 

lor m = 6, 7,---, the same method: of proof applies for Conjecture (a) 
when n 2. Conjecture (b) will not be true for n = 2, as it was not forn = 1. 
A modification of Conjecture (b) for n = 2 will now be given, which is similar to, 
although different from, Billingsley’s modification of this conjecture for the 
special case n = 1. 

Let Yas = is (fu — Z J Rois where f. is the expected value of f, in a 
new sequence of length N given Hz and fujuy; Le, fae = Suyue [TY Pucwesruces 
Where Puju.u; 18S the second order transition probability that X, = us, given 


, , rer 42 *2 * , 
that Xia = w and X,2 = um. Then m2 © Ym2 + 2Dovfa log (fi2/fu.2)- 
When m = 3, the first term W38 in the sum is zero, and the second term is 


2 Dosie Sixx log Sie/fispix), which is asymptotically xj2(.-1) (see [1]). Thus, the 
asymptotic distribution of 3,2 is x52(s—1) - 
’ : . : 2 

When m = 4, the first term Was is asymptotically x,2;,-1)2, and the second 

term is 
2 » Siser log (Lise Sinr/Sin Sis Disk Pind) = 4 D Sin log (fiix/fis Di), 
ajkl tyk 
which is asymptotically 2x2_.). The first term leads to a test of H2 within 
H,;, and the second term leads to a test of H2 within Hs ; it can be seen that the 
two terms are asymptotically independent under H,. Thus, for m = 4, the 
asymptotic distribution of ¥%, when Hz is true, is 
m—3 


Kya) (2/d) # Ky. la/(m — 2)). 
Le | 


This result can be proved for m = 4 by the same method as given here for m = 4. 
Thus, a modified version of Conjecture (b) holds true for n = 2. 


6. The General Case. The method of proof used in the preceding sections for 
n = —1,0, 1, 2 can also be applied when n = 3, 4, --- . In this way, Conjecture 
(a) can be proved in the general case n = —1 and the following modification of 
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. . 42 , 2 , 
Conjecture (b) also holds in the general case. Let Yn» = 7 (f. — fant ham, 
, ‘ ‘ a 
where f,,, is the expected value of f, in a new sequence of length N given H,, 

~ rm . ° e ° 2 e 
and fujus---u, (2 2 1). Then, the asymptotic distribution of Yn», when H, is 
true, is 

m—n—1 


*  Kya)(2/d)*Kone_y[x/(m — n)]. 
Axl 


If we define Vn aS Wmo, then Conjecture (b) for n = 0, is identical with the 
modified version, and it also holds true. For n = —1, H_, is a special case of 
Ho , and the modified version of Conjecture (b) can be applied with n taken as 
zero. The reader will note that the asymptotic distribution of y,2, is not mathe- 
matically independent of n; neither was the asymptotic distribution of Gea. 
The result presented here for _. generalizes Billingsley’s result for n = 1. 

A direct proof of these results could be given for the general case; this was not 
done here, since the proof proceeds along the same lines as the earlier discussion 
herein, and the results may be simpler to understand by considering first n = 0, 
m= 1,2,3,4,°---;n=I1m = 2, 3,4,--* ;n = 2, m = 3, 4, --> ; ete. 

In closing, we mention another conjecture by I. J. Good. In [6], the author 
conjectures that, when Hi, 1 is true, the variables —2 log Am_—1.m (m = 0, 1, 2, 

-) are asymptotically independent, where A»—1, is the ratio of the maximum 
likelihood given H,,; to that given Hj, . If this conjecture were true, than an 
elegant proof of some results for testing Hi, within H. would be available (see 
[6]). We have that —2 log Am iim ia 1, when H/,_, is true. The asymp- 
totic independence of the likelihood ratios follows by the same kind of argu- 
ment presented earlier in the present paper for the independence of some of the 
statistics considered (see, e.g., the reason why Yas and Wai are asymptotically 
independent, given n = 1, in the discussion here of the case m = 4 and n = 1). 

The reader is referred to [13] for results that are closely related to some of 
those presented here, although the general approach and methods are very dif- 
ferent. Also, some of the work in [14], [15], and [16] has some (but not much) 
relation to the present paper. 
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EFFICIENCY PROBLEMS IN POLYNOMIAL ESTIMATION 
By Paut G. Hore. 
University of California, Los Angeles 

1. Summary. Using the generalized variance as a criterion for the efficiency 
of estimation, the best choice of fixed variable values within an interval for 
estimating the coefficients of a polynomial regression curve of given degree is 
determined for the classical regression model. Using this same criterion, some 
results are obtained on the increased efficiency arising from doubling the num- 
ber of equally spaced observation points 

(i) when the total interval is fixed and 

(ii) when the total interval is doubled. Measures of the increased efficiency 
are found for the classical regression model and for models based on a particu- 
lar stationary stochastic process and a pure birth stochastic process. 


2. Introduction. In the classical theory of regression, a set of values x, , 22 , 

- , 2, of a variable x is selected and observations are made on a related variable 
y corresponding to those selected x values. If y,; denotes the y value correspond- 
ing to x;, it is then assumed that y; , yo, «+: , Yn are uncorrelated variables with 
& common variance o. Now if it is assumed that the means of the y’s lie on a 
polynomial curve of degree k, that is, that 


(1) E(y;) = Bo + By; + cre + Byars 


then a basic problem in statistics is how best to estimate the ’s. 

There are two aspects to this estimation problem. One is to determine the best 
method for using the information given by a set of n observtions y; , ye, -°° , 
yn. The other is todetermine the best method for choosing the x values at which 
to take observations. 

Although much research has gone into studying the first aspect of the problem, 
considerably less has been done on the second. Many years ago, K. Smith [1] 
was able to determine those x values within a fixed interval that minimize the 
maximum variance of a single estimated ordinate for polynomials up to degree 
six. More recently, De La Garza |2] was able to show that just as much informa- 
tion is obtained from observations made at certain k + 1 points in the interior 
of an interval as from n distinct points in that interval. Elfving [3], Chernoff 
[4], Daniels [5], and Ehrenfeld [6] have also made contributions toward this and 
other closely related problems. 

In this paper an optimum solution based on the generalized variance is given 
for the problem of how to choose the x values in an interval for the classical 
regression model. In addition, a beginning is made on the more general problem 
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of how to choose x values for efficient polynomial estimation when one drops the 
assumption that the y’s are uncorrelated. 


3. Estimation methods. When a number of parameters are to be estimated 
simultaneously, the volume of the ellipsoid of concentration of the estimates 
is often used as a measure of the efficiency of the estimates. Since the square 
of the volume of the ellipsoid of concentration is proportional to the generalized 
variance of the estimates, one can just as well use the generalized variance as 
a measure of efficiency. This is the measure that will be used in this paper for 
making comparisons of different sets of estimates. 

Suppose one wished to estimate the function Bo + ABr + +--+ + AcBe , 
where the \’s are an arbitrary set of real numbers, by means of a linear estimate, 
CY + Colle + +++ + Cnn . Suppose further that the estimate is to be unbiased 
and possess minimum variance. Then it can be shown that the resulting esti- 
mates for the 8’s are given by the matrix formula 


‘ 4 r y-lx\- r ’ 
(2) B = (X'S 'X)'X'Sy 
where S is the covariance matrix of the y’s and X is the matrix 
2 k 
] TT) zy es Ty 
2 k 
XY = 1 22 2 ++: Qe 
2 k 
[is & == 


lurthermore, it can also be shown that the generalized variance of these esti- 
mates is given by the determinant formula 


(3) G.V. = | X’s°x |" 


These same formulas will be obtained if one assumes that the y’s possess a multi- 
variate normal distribution and then finds the maximum likelihood estimates of 
the @’s. 

The advantage of the estimates given by formula (2) lies in the fact that it 
can be shown that among all linear unbiased estimates of the §’s, the estimates 
given by this formula possess a minimum generalized variance. Thus, if one re- 
stricts himself to linear estimates, these are optimum estimates. All the com- 
parisons to be made in the following sections will assume that the estimates are 
those given by formula (2), and hence that the generalized variance is given by 
formula (3). 


4. Classical regression. Since the classical regression model assumes that 
Y1, Ye, *** , Yn are uncorrelated with a common variance o, the covariance 
matrix S is a diagonal matrix with elements o’. 

Now De La Garza [2] has shown that the same information matrix, X’S"'X, 
and hence the same value of the generalized variance, can be obtained by re- 
placing a given set of n observations at the points 2; , 22, +--+ , 2, by a total of n 
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observations made at k + 1 properly selected points in the interval from 2; 
to x, . These points will be denoted by t,, t2, +--+ , 4, and the number of ob- 
servations to be made at ¢; will be denoted by n,; , where > eed nj n. In terms 
of these substitute observations, the matrices in (3) are all square matrices 
and therefore the determinant of their product can be obtained by taking the 
product of their determinants. As a result, (3) will assume the form 


| | 
3 QO -:---: 0 | 
| 5 || 
1 _|4 bt feya|| Q = -> OL & --- 
GV." |: : : || @ 4 
k k k ° ‘ ‘ : . . 
ty te bess | ice S hes oe 
\O O :-:: 2 
0 
2 “«» OF 
t, te eee teas ] ae n, 
me ys is : ~_ ok+2 
tho ty +++ big 


But this determinant is a Vandermonde determinant with value [lies (t, t;); 
consequently 


1 1 &H » et 
(4) Nay = ~2ky2 II (¢; — t) II ni 
G.V. Co i<j t=] 


Since [iti n;, subject to the restriction diti n; = n, is Maximized when 
Ny = NM = +++ = N41, It follows that the generalized variance will be mini- 
mized for a fixed set of values when the same number of observations is taken 
at each of the ¢ values. This assumes that n will be chosen to make n/(k + 1) 
an integer. 

Now consider the maximization of II.- 5 (t; — ay’. subject to the restriction 
that 7, S t; S an,%4 = 1,---,k + 1. If x is transformed linearly so that this 
restriction assumes the form —1 < ¢; S 1,7 = 1, --- ,k + 1, then it is known 
[7] that the set of ¢ values that maximizes []:<; (t; — t))’ is given by the zeros of 
a polynomial which is the integral of one of the Legendre polynomials. These 
zeros can be obtained from the proper tables [8]. 

It is clear from inspecting the function [](¢; — ¢;)° that the end points of the 
interval will always be chosen as two of the ¢ values. It is also clear that the 
greater the range of x values, the smaller will be the generalized variance. 

In view of the preceding results, it follows that optimum linear estimates of 
the coefficients of classical polynomial regression are obtained by using the esti- 
mates given by formula (2), choosing as large a range of x values as possible, 
taking observations at the k + 1 points in this range given by means of the zeros 
of a tabulated polynomial, and repeating the experiment as many times as the 
total set, x, of observations will permit, with n chosen to make n/(k + 1) an 
integer. 
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The preceding optimum manner of choosing x values assumes that the gener- 
alized variance of the estimates of the coefficients of the regression polynomial 
is the proper measure of efficiency to use. If the sample regression polynomial 
curve is to be used exclusively for estimating ordinates of the theoretical regres- 
sion polynomial curve, then one might prefer a measure of efficiency based on 
the variances and covariances of such estimated values. From this point of view, 
let 71, ++, Tks: denote k + 1 arbitrary points chosen in the given interval 
Further, let a; and &; denote the ordinate, and its estimate, of the polynomial 
regression curve at r; . Thus, 


a= Bot Brits: + Bir, t= 1---,k +] 


and 


a; = Bo + Biri + -» + Bars, eal.---,k+ 1. 


Calculations will yield the covariance formula 


k k 
, oe + r 2 
mi = E(& — a)(a; — @) = Dd on tit} 
r=0 s=—() 
where o,, is the covariance of 8, and 8, . Since the generalized variance is the 
determinant of the covariance matrix, the generalized variance of the @’s will 
be equal to the determinant | m,,; |. But it will be observed that the matrix (m,;) 
‘an be written in the form 


k 
1 Ti * Ti O00 es Tok 1 l — 1 
k 
1 T2 “ee T2 T1 T2 Tk+l 
(mi;) = 
k k k k 
1 a? Tk+l oxo nan Tkk Ti T2 “_—° [ae 


Since |o,,| is the generalized variance of the §’s, it follows that 


k+1 


G.V. (4) = G.V. (8) I] (: -— 7)” 
<<) 
This result shows that the generalized variance of the estimates of the ordinates 
of a polynomial regression curve at k + 1 arbitrary points in an interval will be 
minimized when the generalized variance of the estimates of the coefficients of 
the polynomial regression curve is minimized’. 

A recent paper by Guest [11], which was published after this paper had been 
submitted, has generalized the results of Smith [1] to polynomials of any degree. 
He shows that the values of 4, , 42, --- , t, that minimize the maximum variance 
of a single estimated ordinate are given by means of the zeros of the derivative 
of a Legendre polynomial. It is easily seen that this set of values is the same set 
which minimizes the generalized variance above. Thus, whether one is interested 


'T am indebted to Professor John Tukey for suggesting this relationship. 
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in efficient estimation of regression coefficients, or in efficient ordinate estima- 
tion, either at k + 1 points or one point, the optimum choice of ¢ values is the 
same. 


5. Comparison methods. When the assumption that the y’s are uncorrelated 
is dropped, the problem of how best to choose the x’s becomes very difficult. 
The choice will depend in a complicated manner upon the covariance matrix 
S. As a consequence, comparisons will be made only for equally spaced sets of 
points and only for three classes of covariance matrices. The sets of points that 
were selected for consideration are the following: 

(1) m equally spaced points in the interval (0, 1) 

(2) 2n equally spaced points in the interval (0, 1) 

(3) 2n equally spaced points in the interval (0, 2/) 

(4) two sets of observations of type (1). 

A comparison of the relative advantages of choices (2), (3), and (4) over (1) 
will be made by comparing their generalized variances. Letting 6 denote the in- 
terval between consecutive x values, these generalized variances will be denoted 
by G.V. (n, 6), G.V. (2n, 6/2), G.V. (2n, 6), and G.V. (2 runs), respectively. 

The three classes of covariance matrices that will be studied are the following: 

(a) uneorrelated variables, common variance 

(b) p(yi, y3) = € *"* *, a > O, common variance 

(ec) covariance matrix of a pure birth stochastie process. 

The first of these is the classical regression model considered in the preceding 
section. The second is the covariance matrix of a particular stationary stochastic 
process. The third was selected because it represents a stochastic process of the 
non-stationary type and in which the covariances grow as x increases. These 
three covariance matrices cover a rather wide range of correlation relationships 
and therefore conclusions obtained from them should have a rather wide range 
of application. 

lor comparison purposes it is advantageous to consider the following three 
ratios: 


G.V. (n, 6) P“*? 
R, : V. so | 
G.V. (2n, 6/2) 
‘ G.V. (n, 6) | wats 
(2) Ro = —on 
G.V. (2n, 5) 


oan G.V. (n, 8) 7° 
aoe G.V.(m) runs 


The reason for these choices is that it is easily shown that R; has the value m; 
consequently if the value of R, , for example, should turn out to be m, it can 
be coneluded that m runs of the basie experiment are needed to yield the same 
efficiency of estimation as that obtained by doubling the number of equally 
spaced observation points in the given interval. All comparisons will be made 
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in this manner, that is, by stating the number of runs of the experiment needed 
to yield the same efficiency as the choice of x values being considered. 


6. Uncorrelated variables. It will be assumed that n > k + 1; consequently 
the X matrix in (3) will not be a square matrix and formula (4) will not be appli- 
cable. Under equal spacing in the interval (0, 1), the z values will be chosen as 
x; = 1. As a result, the X matrix will assume the form 


Se eas ry 

1 25 --- (26) 
(6) x 

1 nd --- (néd)* 


+ ae . . ‘ ree . ‘ 
Since S is a diagonal matrix with elements 1/o°, it is easily seen that (3) 
reduces to 


n n 
‘k 
w Be he 
1 1 


gh (et) ze > 2 an oe" 
(7) G.V.(n,8) ot? | | 


> i Zz. + ee > * 
1 


1 1 


The value of this determinant is known [10] to be the polynomial displayed in 
(8); hence 


1 3 (k+1 - , , . ; 
8) ay (a8) = ott? An**"(n? — 1)*(n® — 2)" * «++ (n? — B’) 
rv. (Sh, or 
where A = (1! 2! --- k!)* / (1! 2! --- (2k + 1)!). The value of R, given in (5) 
then becomes 
(9) R, = 1 (2n)***(4n® — 1°)* --- (4n® — &) FY 
' 2' nk+1(n? — 12) --- (mn? — k*) , 


Using (8) and (5), it follows readily that 
(10) R, = 2*R,. 
Now consider the limiting values of R; and R; as n — «. The resulting values 


may be considered as asymptotic measures of efficiency. From (9) and (10) it 
follows that 


lim Ri = 2 and lim R, = 2**’. 


ns nono 


The first result implies that if one has a large number of equally spaced points 
in a fixed interval at which observations are made, then two runs of the experi- 
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ment will yield the same efficiency of estimation as doubling the number of 
equally spaced points in that interval. The second result implies, for example, 
that if the polynomial regression curve is of degree 4, then 32 runs of the experi- 
ment will be needed to yield the same efficiency of estimation as doubling the 
number of points by doubling the interval over which observations are to be 
made. It is clear from this second result that the higher the degree of the poly- 
nomial the more important it is to extend the range of x values as far as possible. 


7. Stationary process model. Denoting the correlation between y; and y; by 
pi; , it follows under equal spacing that the correlation function for model (b) 
will assume the form 


a\rj~z; ad\i—) 


Pi; 7 € _ € 


Letting w = e and setting o = 1, since it will always cancel out in the R 
ratios, it will be seen that the covariance matrix here is given by 


n—I n—% n—s 
Le w w nae ] 


Calculations will show that the inverse of S is given by 


l —w 0 res 0 0 
—w i+w —uU 0 0 
| ‘ 
a x | 0 —w Il+w 0 O 
—w* 
| 0 0 0 oo0 gp OE 


If S' is written as the sum of several matrices and then premultiplied by X’ and 
postmultiplied by X, and finally brought together again into one matrix, it will 
be found that (3) assumes the form 
| edi 

(11) — = Bin, w) 

G.V. (n, 6) (1—w?)**! 
where B(n, w) is the matrix whose element in row p + 1 and column gq + 1 is 
given by 


n 


bien = (w+) De 
1 
(12) | 


— w >, [i — 1)° + 1G — 1)"] — wfn?** + 1). 


» 
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Since w = e, the value of G.V. (2n, 6/2) can be obtained by replacing n by 

2n, 6 by 6/2, and w by ~V/w in (11). As a result, it will follow that 
1+ w| B(2n, Vw) |! ** 
=F TB wo) PO 

Similarly, 
| B(2n, w) |'“*” 
0 ee 
| B(n, w) | 


Now allow n — «. From (12) it will be observed that the dominating part of 


bps1 41 is (w — 1)’ i***. As a result, the asymptotic value of the determinant 
B(n, w) | is 


(w — 1)’n (w — 1)>>> Docs) (we 1)°>> * 
(w — 1)>, i dw 1)*>> PF «ee Gow 1)?>, PP 


w- > * w-y De -- w-yd*| 


But this is merely (w — 1)”*? times the determinant in (7), which in turn has 
; . ; 
the asymptotic value An“*””. From the preceding results, it follows that 


lim R, = V+) (Vw — Diz" _ _2(w + 1) 





ose 2 (w — 1)? (/w + 1)° 
and 
lim R, = 2** 


For the purpose of seeing the implications of these formulas, consider the nu- 
merical value w = e “’ = .64. This value implies that the correlation coefficient 
between neighboring y values is .64. Calculations yield the values 

lim R; = 1.01 and lim R; = 2**". 

now noc 
Thus, doubling the number of observation points in a given interval, when there 
are already a large number of such points, gives practically no additional esti- 
mation information. The value of R. , however, shows that the same asymptotic 
efficiency is gained here as in the case of uncorrelated variables. For correlated 
variables like those being considered in this section, it is clear that the interval 
over which observations are to be made should be extended as far as possible, 
but that if it can’t be extended, repeating the experiment is far more efficient 
than taking additional observation points. 


8. Pure birth process model. Although a pure birth process is a discrete 
process with an exponential regression curve, it was selected only for its co- 
variance matrix properties which are quite different from those of the two 
preceding models. 
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If b denotes the constant asymptotic birth rate, yo the population size at time 
ty, and y the population size at time t > &, then the conditional probability 
function for y, given yo , is 


Plyo, y; to, t} = = ) ett) ao to) ]y—-vo 
Yo — 1 


Using this formula, expected value calculations will show that the covariance of 
y; and y;,j7 = 7, 1s given by 


i; = yoe (i$ [ght s—t0) _— 1}. 
Under equal spacing as before, & = 0 and t; = 75; hence letting z = e”°, 
i; = yor'(z" = 1). 


From this formula it follows that 


z2’(z’ — 1) 


(13) Cijt+m = 2""o and [1 = = g 
z(z’ — 1) 


iF) it 


As a result, the covariance matrix S assumes the form 
- _” 1 
| O11 2011 .o5 zZ on 
n—2 
’ 20 O22 ee Zz 02 
S = -_ 


The second of formulas (13) enables this matrix to be expressed as the product 
of the following two matrices. 


* 0 0 
z—] ‘i ' 
z-—-l2z-2z z—2 
Ox» a a2 n _%—-2 
9 3 0 ‘se 9 22 
— | 
z-1 27-1 z— 1 
) 0 os 
( n 
ne | 


l 
z+1 -z -:-: 0 0 |: = Leg 


—1 z+l1ées:-:: 0 O 

yo(z — 1), : :  # 
0) 0 sss 2-4 1-2 : 
() 0 vee —1 1 os 
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Additional lengthy computations, similar to those employed in the preceding 
section, will show that (3) assumes the form 


l a’) | O(n, 2) 


G.V.(n, 6) yFFQg — 79) *2" 


where C(n, z) is the matrix whose element in row p + 1 and column q + 1 is 
given by 





FF — PS = 1 (3? — 2”)(3° — 2") 
Cart nae — F_— 
(n? — (n — 1)")(n* — (n — 1)*) 
be vse mh Hee tr > -(n is. 
lor p l or q 1, Cpsiyaa 18 defined by ey z” /(z — 1) — 1, and ey 
Chi a —1,¢4> 1. 


When n — ~, the elements of this matrix exclusive of those in the first row 
and first column, converge to functions of z, for z > 1. Let 


Ipq(2) lim C541 941 - 


neo 


Since, for z > 1, ¢, dominates ce, ,7 > 1, the determinant | c(n, z)| will possess 
the asymptotic value 


gui (2) -9+  gae(z) | 
2— 1) 
1gu(z) +++) Oex(z) 
Vor k < 5 it has been shown that the preceding determinant has the value 
k(k—1)/2 
C2 
2 
(z — 1) 


where ¢ depends on k but not on z. Using these results the asymptotic value of 
the generalized variance is given by 


l ek(k+l kik 
) cz 
(14) 


eT © k+1,_ , \ki+ks1 
G.V.(n, 6) Yo (z — 1) 


Krom this result it is easily shown that 


(k2+k+1) /(k+1) 
lim Ri = eS ieceteabaall 


all F(a/_) oe 


Since (14) does not involve n, it follows that 


lim R, = 1. 
n~no 
As a numerical illustration here, let z = e’ = 10/9. This value implies that 


the correlation between 4, and ye is approximately .7 and increases between 
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neighboring y values us one moves out on the axis. Calculations here yield the 
following limiting values for R, . 


k l 2 3 4 


lim Ry | 1.47 1.32 1.25 | 1.20 
noe 
These limiting values of R; show that some additional estimation information 

is gained by doubling the number of points in a fixed interval but that repeating 
the experiment yields considerably more information. The limiting value of R, 
would seem to indicate that no additional information is gained by extending 
the interval. This limiting result, however, is not realistic for small samples as 
will be seen in the next section. 


9. Numerical results. Since the asymptotic measures of estimation efficiency 
obtained in the preceding sections may not be very realistic for small numbers 
of observations, some numerical computations were made with the assistance of 


high speed computing equipment. The values of w = .64 and z = 10/9 used 
previously were used in these computations. Values of n = 5 and n 10 were 
chosen but only the results for n = 10 are given because some of the n = 5 


values appeared questionable and because there were only moderate differences 
between the two sets of values. The limiting values of R, and Rs. are shown in 
parentheses adjacent to the computed values. In these computations, adjust- 
ments were made in the values of A; and R, to allow for the fact that doubling 
the number of points in an interval extends the total interval spanned by the 
points when the first point is located at x = 6. These adjustments essentially 
kept the spanned interval unchanged. This was accomplished by replacing 6/2 
by 6(n — 1)/(2n — 1) in the denominator of R; andé by6(2n — 2)/(2n — 1) in 
the denominator of FR, . 


ie 


c Model (a) 


| 


Model (b) Model (c) | 


1.90 (2) 


.03 (1.01) 


| 

1 | J | 1.43 (1.47) 
2} 1.81 (2) | 1.02 (1.01) | 1.25 (1.32) | 
3} 1.72 (2) | 1.02 (1.01) | 1.19 (1.25) | 
4 1.64 (2) 1.03 (1.01) 1.18 (1.20) | 

k Model (2) Model (b) Model (c) 

| 3.80 (4) 2.91 (4) 1.83 (1) 

2| 7.24 (8) 5.01 (8) 1.85 (1) 

3 13.76 (16) 8.94 (16) 3.21 (1) 

4 | 26.24 (32) 16.41 (32) 6.15 (1) 
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It will be observed that the asymptotic values of R. are poor approximations 
for models (b) and (c). These results seem to indicate that in general one should 
always attempt to extend the range over which observations are to be taken as 
far as possible and the higher the degree of polynomial the greater is the advan- 
tage. They also seem to indicate that if the range can’t be extended, it is con- 
siderably more efficient to replicate the experiment than double the number of 
observations, particularly if the variables are strongly correlated. 
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ON THE GENERAL CANONICAL CORRELATION DISTRIBUTION 
By A. G. CONSTANTINE AND A. T. JAMES 


Division of Mathematical Statistics, C.S.I.R.O.' 


1. Summary. The paper is divided into two parts: 

A. An elementary derivation of Bartlett’s results on the distribution of the 
canonical correlation coefficients using exterior differential forms. Briefly, our 
method consists of taking the original multivariate normal distribution, trans- 
forming to the canonical correlations and other variables, and then integrating 
out these extraneous variables. 

B. A new method of calculating the conditional moments which appear in 
Bartlett’s expansion of this distribution, based on the process of averaging over 
the orthogonal group. This method allows the calculation of moments of any 
order. 


Part A 


2. Introduction. Bartlett [1] obtained the general canonical correlation dis- 
tribution as a multiple power series in the true canonical correlations p; . In the 
case of more than one non-zero correlation p; , the coefficients in this expansion 
depend on the conditional moments of the sample (ordinary) correlations s; be- 
tween the pairs of transformed variates representing the true canonical vari- 
ates, when the sample canonical correlations r; between the sample canonical 
variates are fixed. 

Bartlett derived his results by a formal generalization of the argument used 
by Fisher [2] in calculating the distribution of the multiple correlation coefficient. 
We shall give a new proof of Bartlett’s results in a concrete form more suitable 
for our purposes. Throughout this paper we shall use the concepts of exterior 
differential forms and alternating products of these forms. The definition and a 
discussion of these concepts may be found in James [6]. 

Consider a dependent vector variate with p components and an independent 
vector variate with g 2 p components. (Here the terms “dependent” and “‘in- 
dependent” are to be understood in the regression sense.) If we take a sample 
with n(=p + q) degrees of freedom, we may represent it by the p + q column 
vectors & , &,---,&, and m, m,---, m¢, each containing n components. The 
dependent vector is considered to be a normal variate, and we may distinguish 
two cases, according as the independent variate is assumed to be (a) a normal 
variate or (b) a set of fixed vectors in the sample space. In either case we may, 
without loss of generality, assume the £; and 7; to be the canonical variates (see 
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Hotelling {[4]). This means that in case (a) the n components of each vector are 
standard normal variates with the joint distribution 


p f if n 
Il < (27) "(1 — pi) ”” ‘exp| = (& igs pti mms | I] dé,; inh 
(21) “* ms - 


j=pt+i | 


IT { (20)? exp (=nin/21 IT dan 


In case (b), the non-central means case, we may assume the components of the 
£; to be independently distributed with unit variance, and the 7; to be vectors 
lying along the first q co-ordinate axes of the sample space. m ,--- , 7» may 
also be identified with the mean vectors of & ,--- , &». The joint distribution 
of the &,; is therefore 


F ( ° , , , , - ) 
(2.2) IT (2x) " exp [— (iE: — 2Eins + ni ni) /2) I] dési>. 


We denote sample correlations between £; and n; by s; and the sample canoni- 
cal correlations between the sample canonical variates by r;. The r; may also 
be interpreted as the cosines of the critical angles between the two planes 
spanned by 1, --- , 2, andy, +--+ , yg respectively, where the x; and y; are the 
sample canonical variates. The distribution of the r; for each of the two cases 
mentioned above will be derived in sections 3 and 4 respectively. 


3. Distribution of the canonical correlation coefficients. Our starting point is 
the distribution (2.1). The distribution of the canonical correlations r; will be 
obtained by expressing this distribution in terms of the r; and other variables 
and integrating over the ranges of the latter. First of all, let us dispose of the 
lengths of the vectors £; and n;. 

Put & = 7,w; and 7; = oz; where 7; and o; are the unit vectors along &; and 
n; respectively, and w, and z; are their lengths. Then 


(3.1) II dt,; = w; * dw, dS(r;) 
ve] 


where dS(r7;) is the element of area on the unit sphere in n-space. With an 
analogous expression for II dn,; the distribution (2.1) becomes 


"(1 — pi) "1 (n/ /2)/? 


1 
1 ‘ . | 
- exp ia ——- (wi + 25 — 2pisyu; 2) | (w,2;)"' dw; dz;> 
2(1 — 3) 
ii I'(n/2) sa, 
x I as Ta 5 exp [—dziles ‘ae, I : a 48 S(7;) 


x II ra) dS (@;) 
“a7 


| 
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where s; = 10; (see section 2). The constants have been split up to make the 
latter factors probability distributions. 

The integrals of the factors containing z; for 7 = p + 1, --- , q are obviously 
unity. Furthermore, by expanding the factor exp [(1 — p<) ‘pisw.z;J in a power 
series and integrating term-by-term with respect to w, and 2; (7 Lo 5 Pp) 
we obtain 


p@® q@ l 


J I 2"*(1 — pi)” “[P(n/2)F 
(3.3) - exp [—(w? + zi — 20:8; w;2;)/2(1 — pi] (wi2,)" * dw, dz, 
= (1 — pi)” oF 1(n/2, n/2; 1/2; pis) + an odd function of s;, 


where 2’; is the Gaussian hypergeometric function. Later on, we shall see that 
the odd function of s; vanishes in the subsequent integrations. 

The next step is to express the unit column vectors 7; and o; in terms of the 
‘sanonical correlations 7; and the vectors x; and y; which determine these corre- 
lations. Let p and q be the p-plane and the q-plane spanned by the vectors 7, 
and o; in n-space. Then p and q determine almost certainly (i.e. with prob- 
ability 1) the orthonormal vectors 2; and y; (7 l,---, p) which make the 
critical angles between the planes, i.e. such that xiy; = ri, ry; = O(4 # j). 
Let further vectors yp41, --- , Yq be defined as functions of p and q to complete 
an orthonormal set spanning q. 7’, =, X, Y will denote the matrices composed 
of the column vectors 7;, 0; , 21, y;, respectively. It follows that X’X = /7,, 
Y’Y = J, and X’'Y = [R : 0] where R is the diagonal matrix with the r; down 
the main diagonal. Furthermore we may write 


(3.4) T=XA, 2=YB 


where A isa p X pand Bisag X q matrix. Then 7”7 = A’A and 2’S = BB. 
The matrices A and B are subject only to the restriction that all their columns 
a; and 8; are of unit length. 

We now substitute for Il dS(r;) and Il dS(o;) in (8.2), using the transfor- 
mations (3.4). To avoid interrupting the continuity of the argument we shall, 
for the moment, only give the results of the substitution, and defer the proof 
until section 5. We have then from (5.4) 


Pp Pp 
(3.5) TT as(r,) = | A’A |”? T] dS(ai) dp + *(dX) dp 

i=1 t=1 
where dS(a;) is the element of area on the unit sphere in p-space and dp is the 
differential form representing the invariant measure on the Grassmann mani- 
fold of p-planes in n-space. The symbol *(dX) stands for certain differentials 
involving the elements of X, which, when subsequently multiplied by other 
differentials, will vanish. Similarly 


a q 
(3.6) I] dS(c;) = | B’B\"®? TJ dS(p,) dq + *(A€Y) da 
1 
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where dS(8;) is the element of area on the unit sphere in g-space. Multiplying 
(3.5) and (3.6) we obtain 


P q 
(3.7) TdS(r.) [[as(e,) = | A’A |”? [J dS(a,) | BB \O™ *TLasis) ) dp da. 
t=1 j=l i=] 
The terms containing *(dX) and *(dY) vanish when multiplied by dp dq, since 
dy dq is of maximum degree in p and q and X and Y are functions of p and q. 
The differential form dp dq may now be expressed in terms of the r; and other 
variables. Integration with respect to these latter variables yields 


K,K g(r: | = 0) 


where K, and K, are the normalising constants of the differential forms dp and 
dq respectively, and ¢(r;| p; = 0) is the null distribution of the r; (see James 


(6)): 
(rele = 0) = CTL errr = Horry TT =) [Tart 


tel i<) t=—1 


e/a) 


This distribution was first derived by Fisher [3], Hsu [5] and Roy [8]. The values 
of K, and K, are given by 


and 











Gin — 1+ 1) in ae 
Te Git?. a= ig, ome 
After this integration, the right hand side of (3.7) becomes 
(3.8) K,| A’A |‘? T] dS(a,)K, | B’B |“ ®” T] dS(B;)-6(r; | os = 0), 


showing that A, B and the r; are independently distributed. 
Substituting (3.3) and (3.8) in (3.2), we may write the distribution of the 
r; as 
p 
[ [TL ta = oor *srenya, n/a; 1/2; ot ohh, | Ata |? 
A B i=l 
(3.9) . ‘ 
- ]] dS(ai)k,| B’B|"” TT dS(B,)o(rijos = 0), 
t=] j=l 


together with the relation 


(3.10) 8) = Ti0s = OBiti + oBatas + ++ + apBpiry . 


The normalising constants k, and k, for the distribution of A and B are given 
by 


Ce~i+t 
(3.11) = I1' ae , = me 
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In view of equation (3.10) we may now identify our distribution (3.9) with 
Bartlett’s distribution, [1], equations (8) and (10). 

If the hypergeometric functions are expanded as power series and multiplied 
together, the function multiplying ¢(r; | p; = 0) is seen to be a multiple power 
series in the p; whose coefficients depend on the expectations of monomials in 
the s; with respect to the distribution 


(3.12) kp| A’A | ‘"-”? dS(a) --- dS(ap) 


of A and a similar distribution of B. 

So far we have ignored the odd function of s; appearing in the integral (3.3) 
However, any odd function f(s;) of s; will have zero expectation. In fact, putting 
—a; instead of a; does not alter the distribution (3.12) of A, but changes s; to 
—s, in view of (3.10). Therefore, 


E{f(s:)| = EUf(—si)] = E[—f(s,)] — EI f(s;)] 


and so E[f(s;)| = 0. It is sufficient, therefore, to compute only moments of the 
form glh, &,**:, t) = E{(s3)'"(s3)"? --- (s,)'?} where the expectations are 
taken with respect to the distributions of A and B and the 7; are held fixed. 
l'urthermore, if we substitute in (3.9) for s; using (3.10), the calculations are 
reduced to finding the moments of the a;; and £;;, two independent sets of 
variates. 

Theoretically these moments could be found directly from the distributions 
of A and B. However, as Bartlett pointed out, this method is too difficult 
algebraically to be of much use, except in the case of only one non-zero p; . Bart- 
lett indicated a method whereby moments of the form y(t, t) could be 
calculated, and also calculated u(1, 1, 1) by employing various relations connect- 
ing the a-moments (see section 10). Again, both of these methods led to awk- 
ward algebra and had to be abandoned for moments of higher order, though 
Bartlett was able to compute u(1, 1) u(2, 1), u(2, 2) and (3, 1). In part B of 
this paper we shall present a method enabling moments of any order to be 
computed, and shall complete the tabulation of moments up to the fourth 
order with u(2, 1, 1) and u(1, 1, 1, 1). 


4. The non-central means case. Let p be the random plane spanned by the 
vectors &,---, & and q the fixed plane spanned by m , m2,°°:, 7q¢. AS we 
saw in section 2, we may assume that the &,--- , & are independently dis- 
tributed and their components &,; have the distribution 


Pp n 
(4.1) [] (22) ™” exp [ — (&5 & — 28; m6 + 04 )/2) TT 8... 
i=] vel 
Furthermore, the nj (j = 1,---,q) may be taken as vectors lying along the 


first q co-ordinate axes in the sample space and thus having only one non-zero 
component each, say yu; in the jth position. 
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Putting | = 7r,w, as before, (4.1) becomes 


IT oT 5) &xP [ — (wi — Quy 8; ws + wi)/2)w? dw; 
: = (n/2 
(4.2) 
x rt P(n/2) dS(r 
i=] 2r 
where s; = 7,,. The integral with respect to w; of the ith factor in the first 


product of (4.2) is F;(n/2; 1/2; wisi/2)e";* + an odd function of s,. This odd 


function will again vanish in subsequent integrations and may be ignored from 
now on. 


Let X be the n X p matrix whose columns are the orthonormal vectors 2 , 


v2, °**, 2» Spanning p and which make the critical angles with q. The 7; may 
be expressed as linear combinations of the zx; by putting 
(4.3) T = XA. 
Since X’/X = J, we have 7’T = A’A. From section 5, (5.4), it follows that 
Pp - 
(4.4) [] dS(r,) = |A’A|%”? TJ dS(a,) dp, 
i=l i=] 


the differential form *(dX) dy vanishing since X and » are functions of each 
other. 
To express p in terms of the r; , we partition X as follows: 


y 
(4.5) X =|-:- 
Z 
Yi 
where Y isag X p matrix and Z is an (n — q) X p matrix. The vector] --- | in 
0 


q makes the 7th critical angle with x; in p. Let 8; and y,(i = 1,---, p) be the 
unit vectors along y; and z; , then according to [6], equation (7.10), 





(4.6) yi = Br,, ms Ve V1 _ r 
and 

1 K 

Oo Sel Ge <i +f 


(4.7) 
a , 
- adv (8) [[2-1 @ G(r = i+ 1) dV (y)o(r; pi = 0) 


where K, and G(7z) are defined in section 3, and dV(8) and dV(y) are the invari- 
ant measures on the Stiefel manifolds of p-frames (8; , --- , 8p) in q-space and 
p-frames (y1, °°: , Yp) in (n — q)-space. The constant has been split up to nor- 
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malise these invariant measures. If we choose q — p orthonormal vectors Bp41 , 
- , B, orthogonal to 8, --- , Bp we may express dV(@) as 


(4.8) dV (8) = II 6: ds; IY 116) as,. 


j=pt+1 ial 


Also, 


Pp Pp 
& = ti = D> xrijer;i = Dd Bir j0r;: 
j=1 j=l 


If we please, we may replace 8;; by 8;; since they have the same distribution. 
Integrating (4.7) with respect to y, substituting in (4.4) and then in (4.2), we 
obtain the distribution of the r; as 


I, | TT sFu(n/2; 1/2; ui sile s kp| A’A|™”” * TT asta) 


B i= 
(4.9) 
IB.6(r; | ps = ( 
“TT? PSS ie TL, 185 43.6070 = )( 
where 
(4.10) 8: = aniBiti + ++ + apiBpilp. 


We notice that the distribution of A is identical with its distribution in the 
previous case, but now the distribution of B is the invariant distribution on a 
Stiefel manifold and is independent of n. However, A and B are still inde- 
pendent. 


5. Distribution of the co-ordinates of random vectors in a random plane. 
In relation to the rest of the paper, the purpose of this section is to derive equa- 
tion (3.5) and a result at the end of section 7. However, the results will be more 
interesting and intelligible if discussed in terms of probabilities. 

T1,°**, Tp are invariantly distributed unit vectors in n-space, which we 
write as the columns of an n X p matrix T. p is the plane spanned by the 7; . 
We define in p a reference set of orthonormal vectors, which we write as the 
columns of ann X p matrix X. Thus X is a function of p and 


(5.1) Xx’x 


Let the column a; of the p X p matrix A be the co-ordinates of 7; relative to 
the reference set X: 


(5.2) T = XA. 


We shall show that p is invariantly distributed and that A is independently 
distributed with density proportional to 


(5.3) |A’A|°? TT dS(as) 
i=] 
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These results are implicit in Bartlett [1]. They follow from the lemma which 
we shall now state and prove. For the application in section 3 we shall have to 
generalise the situation slightly to include the case when the reference set X is 
not necessarily a function of p alone. 


Lemma. If T is ann X p matrix whose columns 7; are unit vectors, and X and 
A aren X pand p X p matrices satisfying (5.1) and (5.2), then 


P P 
(5.4) I] dS(r) = | A’A!"””? TT dS(a,) dp + *(dX) dp 
i=] i~l 


where *(dX) is a differential form in X and A, every term of which is of at least 
first degree in dX. If X is a function of » alone, then *(dX) dp = 0. 


Proor. Selecting a single column from the matrix equation (5.2) we have 
(5.5) 7 = Xa. 
Differentiating: 
(5.6) dr; = dXa; + X da;. 


As the differential form for dS(a;) will be required, we introduce p — 1 ortho- 
normal column vectors in p-space orthogonal to a;. Let C; be the p X p — 1 
matrix with them as columns. Then dS(a;) is the alternating product of the 
elements in the vector C; da; . 

The differential form for dS(7,;) requires n — 1 orthonormal vectors orthog- 
onal to 7;. The columns of the matrix XC; provide p — 1 of them, since 
C,X'r;, = CiX'Xa; = Cia; = 0. Choose the remaining n — p orthonormal 
vectors orthogonal to the plane p and let them be columns of an n X (n — p) 
matrix B, which is to be a function merely of p. 

Premultiply (5.6) by the transpose of the partitioned matrix [XC; : B): 


Cc, X dr; Cc, X’ dXa; ot C; da; 
B' dr; B’ dXa; 
Then, the alternating product of the differentials of the vector on the left is 
dS(r;) and hence the product of all of these for i = 1, --- , p is the density 


on the left-hand side of (5.4). 
The alternating product of all the differentials in the right-hand side of (5.7) 


for? = 1,--- , p will give the density in the new co-ordinates. Let us deal with 
the vector differentials B’ dXa; first. These p vector differentials, corresponding 
toi = 1,---, p, comprise the columns of the matrix B’ dX A, of whose elements 


we therefore want the alternating product. The alternating product of the 
elements of a row of this matrix is | A | times the product of the row of the 
elements of B’ dX. There being n — p rows in B’ dX A, the alternating product 
of all its elements is then | A |""?]]; [], b} dx;. The differential form 


IL [1.6 ae, 
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is the invariant measure, dp, on the Grassmann manifold, i.e. the uniform dis- 
tribution of a p-plane in n-space (see [6}). 

As the differential forms represent probability densities and must therefore 
be positive, we replace | A | by its modulus | A’A |". 

The product of the elements of the vector C; da; is dS(a;). All the products 
involving an element of C{X’ dXa; we lump together in the symbol * dX. Col- 
lecting all factors we obtain (5.4). Q.E.D. 

We conclude with a result that we shall need in section 7. From (5.1) and 
(5.2) we have T’T = A’A. Hence, if A has the distribution (5.3) then the mo- 
ments of A’A are the same as the moments of 7”’7' where 7’ has the distribution 


II adS(7;). 
Parr B 


6. Introduction. In this part of the paper we shall be concerned with the 
problem of calculating the conditional moments 


ee ae 
u(t , bo, +++, tp) = Elsi) *(s2)? --- (sp)? I 


required for the expansion of the distribution of the canonical correlations r, . 

Recalling the results of sections 3 and 4, we saw that the expectations of 
monomials in the s; could be replaced by the expectations of monomials m(A, B) 
in @;;8;; in view of the relation 


(6.1) &; = Qj :Bii71 + _ + es . 
Furthermore, since A (a,;) is distributed independently of B = (;;,), 
E(m(A, B)| = E[m(A)] E[m(B)| 


where m(A) and m(B) are monomials in the elements of A and B respectively. 
Considering case (a) for the moment, we saw that the distributions of A and 
B were 


(6.2) k, | A’A |'"”’* dS(a) «++ dS(a,), 
and 
k, | B’B |"? dS(B,) «++ dS(B,) 


respectively. Consequently, [m(B)] may be obtained from E|[m(A)] by simply 
replacing p with q. 

In case (b), though the distribution of A is still given by (6.2), the distribu- 
tion of B is given by (4.8), the invariant distribution on the Stiefel manifold 
of p-frames in q-space. We notice, however, that if we let n — © in case (a), 
then the set of random vectors (8; , --- , 8») becomes a rigid p-frame, and this, 
of course, is exactly the situation in case (b). Hence the 6-moments may be 
obtained from those in case (a) by letting n — «. To summarise, then, it is 
sufficient to compute only the moments of the distribution (6.2). 

To compute these moments by direct integration is obviously going to lead 
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to involved algebra. However, by first averaging the monomials m(A) over the 
orthogonal group we can considerably simplify the problem. Before proceeding 
further we shall briefly discuss this important process. 


7. Average over the orthogonal group. The process ®” of averaging over a 
group is a linear process whereby a function, defined on a space on which a 
group of transformations acts, is changed into a function invariant under the 
group. In particular, we consider the group © of all orthogonal matrices H, and 
a matrix A = (a;;) which is transformed by the elements of 9: 


(7.1) AHA 


If f(A) is a function of the elements of A, then 
Mf(A) = | f(H"A) dV(H) 
> 


is a functjgn invariant under the transformations (7.1). V(#) is the invariant 
measure on the orthogonal group, normalised so that V() = 1. M¢f is called 
the average or mean value of the function over the group. (This definition of 
“mean value” should not be confused with the usual statistical definition.) 
Since Mf is invariant under the orthogonal group, it must be expressible as a 
function of the basic invariants a;a; (see Weyl [9], pp. 52-6). 

We wish to calculate the expectations of monomials m(A) in the elements of 
A. Since the distribution (6.2) is invariant under the transformations (7.1), 
E({m(A)| = E[m(H™'A)), and hence it follows that 


E[m(A)] = [ Bimcay) dV(H) [ Elm(H "A4)] dV(H) 


E | m(H'A) dV(H) = E[Mm(A)]. 


In section 8 we shall show how to calculate Ptm(A). 

However, assuming for the moment that this has been done, we see that the 
problem has been reduced to the evaluation of the expectations of certain in- 
variant functions ¢(A’A), say. At this point it should be noted that the prob- 
lem of the 6-moments in case (b) has been completely solved. For, if we let 
n — ©, then B’B = I with probability 1, and hence E[m(B)] = ¢(J). E[m(B)} 
can be then evaluated by the method given in James [7], pp. 374-5. However, 
since we require the 8-moments for case (a), we may as well compute those 
for case (b) by letting n — © in the former moments, as indicated in section 6. 

lor the a-moments (and the 8-moments for ease (a)), we still have to eval- 
uate the expectations of the invariant functions. In section 5 we have shown 
that the a,c; have the same distribution as quantities rT; where 71, °**, Tp 
are independently uniformly distributed unit vectors in n-space. Finally, then, 
there remains the calculation of the moments of the rT; . This will be accom- 
plished in section 9. 
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8. Calculation of Xtm(A). In section 7 it was shown that 
E{m(A)| = E[Mm(A)] = El[o(A’A)}. 


In this section we shall show how to evaluate Im(A). 

Let 
(8.1) m(A) = aiijaidwe see 
denote a monomial in the a;;. Then if C is an arbitrary p X p matrix, the ex- 
pansion of the function exp (tr C’A) contains every monomial (8.1) multiplied 
by the same monomial m(C) in the corresponding elements of C, and divided 
by ky ! ke! +--+ . James [7] has shown that M exp (tr C’A) can be expanded as a 
multiple power series in the elementary symmetric functions 2 , Z2, --- , Zp of 
the latent roots of C’CA’A. Thus, if A; , --- , Ap are the latent roots of C’CA’A, 
then 


> + = tr C’CA’A, 


Z A; A; = sum of principal 2nd order minors of C’CA’‘A, etc., 


9 
21 21 22 


IM ex tr C’A) = 1 2 Si aie 2 
aaa + 35+ mtd * P@tD@—D 


3 
21 2122 


+ 8-3! p(p + 2)(p + 4) + 4p(p +. 2)(p + 4)(p _ 1) 
4 
ee le ~ +- nm = 
p(p + 2)(p + 4)(p — 1)(p —2) 244! p(p + 2)(p + 4)(p + 6) 
i cleanness z1z a 
l6p(p + 2)(p + 4)(p + 6)(p — 1) 


* 8p(p + 2)(p + 4)(p + 6)(p — 1)(p + 1) 
.—_— iieitesnsasass te 2)zs 
2p(p + 2)(p + 4)(p + 6)(p — 1)(p + 1)(p — 2) 
i iceeeeeiniinnnemans ee aes 
2p(p + 2)(p + 4)(p + 6)(p — 1)(p + 1)(p — 2)(p — 3) 
Hence, Itm(A) can be found by equating the coefficients of m(C) on both sides 
of (8.2). 
If we write A’A in the form 
1 a ae Ot Ot tee a1 ay 
a} a 1 a3 a3 
(8.3) ay a3 as as 1 


r , 
1 Q1Ap Ara, 
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we see that Im(A) will be a linear combination of monomials in the invariants 
a,a;. The expansion (8.2) is sufficient to compute all conditional moments up 
to order 4. If higher moments are required, further terms can be added to (8.2) 
by the use of recurrence relations derived from the differential equations given 
in James [7]. 


9. Calculation of the moments of the invariants. We are given that 7, 72, 

-, Tp are independently uniformly distributed column vectors in n-space, 
und we require the expectations of monomials in 7;7;. If a monomial in 7(7; 
were expanded as a sum of monomials in the 7;; , the expectations of each of 
these could be calculated and summed. However, the expansions would become 
very complicated. They can be avoided by the following method, which is an 
extension of an idea due to Bartlett [1] p. 13. 

Let ¢:, €2, +++ , €p be the unit vectors along the first p coordinate axes. Then 

the joint distribution of 7,---, 7, is the same as that of Aye: , Azer, --- 
A,ep, Where the A, are random orthogonal matrices independently and in- 
variantly distributed (see James |6]). Furthermore, the invariant functions 
will not be altered if they are calculated from the vectors e; , AiAses , . 
A 1A yf p- These vectors have the same distribution as e; , Asé2, --- , App since 
, a yp are still independently invariantly distributed. Again, if 
Az = (aj,;), say, the invariant functions will not be altered if we replace the 
vectors by 


’ 


Ci, BrAxee g OO « BSA yep 
where 
0 


22 ‘Dos 


Ase / bo» 


0 An2, bee 
bos ™ 


that B, is orthogonal. 


l — ap do + --- + ayo, and the remaining elements are chosen so 
C 


learly, 
a2 
bee 
0 
0 


- 4 , , . ” . . . 
Since the matrices B,A;,--- , BsA, are still independently invariantly dis- 
tributed we may replace the vectors by 


’ 
Ai, BoActs ’ Ages ge A» ° 
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Proceeding in this way we see that we obtain the same values for the expec- 
tations of the invariants if we replace 7, 72, +++ , Tp by 


1 aye 3 

0 bes fi23 

0 0 bss 
(9.1) : a oi. 

0 0 0 
(To avoid introducing further notation, we have denoted the third column of 
A; by the elements ai; , @23,--+ , Qn3, those of the fourth column of A, by 
Gu. Gn, **:, Ga 0. Than & = 1 = & =- &, & = 1 ~ ae — Gee — Ge, 
etc.) 


EXAMPLE |. As an example let us evaluate 
E | (exya)(cr2003) (1304) (p04) = E{(r172)(r273)(7374)(7174)). 
Substituting from (9.1), this expectation is equal to 
(9.2) El ayo(dy20y3 + beeQe3)(Gy3014 + GezQo4 -+ D334) 44). 
Now any monomial in the a;; , b;; containing an odd power has zero expectation 


since the distribution is unaltered if we, replace a;; by —aj; or bi by —bi. 
Hence, (9.2) reduces to F(aj,a}3;034). a2 , a3 and a, are independently uniformly 


distributed unit vectors, and hence E(a‘z) = E(ai;) = E(a‘s) 1/n. Therefore, 
E| (ajo) (arar3) (7304) (ar,04)] 1/n’. 
EXAMPLE 2. H(A) where A = | A’A|. 
Put 


1 aye as 
O be ags 
0 0 bss 


Then A CC | = |C |", and 


E(A) = E(1- bo: b3s- «-b5>) 


since E(b») 1 — E(aje2) | 1/n, ete. 
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10. Example of the calculation of the conditional moments. Following Bart 
lett, we introduce the notation? 
9 
2 2 /Q2 g2 - 
E(ay a2) E(B Bi) = (3), 
(10.1) c 
E(ai; a2) (Bh Bee) = ( 2. etc. 
From equation (6.1) it is seen that the conditional moments can be expressed 
as linear combinations of “arrays” similar to those in (10.1). As we saw in sec- 
tion 6, it is sufficient to calculate the a-moments only. 


To illustrate the method let us caleulate the a-moment or ‘‘half-factor’’ 
corresponding to 


1.e. Ek ( 1110113021 Age 3033 ). 


The first step is to calculate Mm(A). Now, 


. - Cnet °*-. Ox@e H -*-. 
C'CA'A = inte TF -**. ” : 
oad CuGig s+. CO Tt ***. 


All remaining terms in C’C may be neglected as they will not contribute to m(C) 
in the expansion (8.2). 


8 = 48( cya) (xxx) (axja3)m(C) + +++, 


‘ , , , 5 , 2 , 2 , 2 ’ 
2122 = 4} 3(aya2)(a2as)(aia3) — (aya2)) — (a2a3) — (aya3)}m(C) + 
’ ' | 
] a) Ae a 3 | 
= > ’ ’ ’ | 
“3; = 2m(C) a1 Ae l A2 a3 4 
, , 


a, a3 ae a3 l 


Hence, after calculating the expectations of the invariant functions by the 
method of section 9, we have 


E(zi) = = m(C) + --- 


/) ia 
E (2, 22) = 1) eit +... 


2 


E (zs) = 2(n ae ——— m(C) + eee, 


? Actually our notation differs slightly from Barlett’s. Whereas Bartlett worked in terms 
of rows vectors, we have worked in terms of column vectors, and hence Bartlett’s a;; cor- 
responds to our a,, . 
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Substituting in (8.2) and equating the coefficie::is of m(C), we obtain after 


, 


simplification 


‘ (n — p)(2n — p) 

E (aan a1; 21 & 22 a32 33) = —— = : —F 
n*p(p + 2)(p + 4)(p — 1)(p — 2) 
which agrees with the value tabulated by Bartlett. 

Any other a-moment can be calculated in a similar fashion. In particular, 
the moments tabulated by Bartlett were checked and the various terms con- 
tained in »(2, 1, 1) and u(1, 1, 1, 1) have been calculated and ineluded in the 
appendix. Actually, only the a-moments have been tabulated. The complete 
value for case (a) may be obtained by multiplying the a-moment by a similar 
value with q replacing p. The complete value for case (b) is obtained by taking 
the previous value and letting n — © in the second half. 

Incidentally, the a-moments may be checked by an independent method. 

. ‘ ‘ 4 2 . . , . 
lor example, consider the monomial ayay. If we multiply it by aja3 , which 
oe ° s ° ao . = , \ 74 2 - * 
is identically unity, then Elajaie(asa3)} = FElaiais|. But expanding the term 
on the left-hand side, we get 


’ 4 2 ’ 4 2 2 vs 4 2 24 ’ 4 2 2 
Elayay) = Elayejei;| + ELlanaisess| + Llainaisa33] + --- , 


and therefore 


4 a . 
4 
() - 2 +o-2 -, 
“/ \2 . 2 
Similarly, by expanding (aj1a2)*(aia3)°, whose expectation 1/n°’, we have 
4 2.2 ae 2 2 
pi2)+ pip— 1,2 -}+4p(p— 111 1)+2p(p—)D]1 1 
(2 - 2 ) 1 1 
a -~ & § 
+ 2p(p — 1)(p — 2), - 1 1) +4p(p— I(p—2)]1 1 - 
os ak 
sz i i 
+ pp — 1)(p — 2)(p— 3)f1 1 - + p= 1/n’. 
ae oe 
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SIGNIFICANCE LEVEL AND POWER' 


By E. L. LeumMann 
University of California, Berkeley 


1. Summary and introduction. Significance testing, as described in most 
textbooks, consists in fixing a standard significance level a such as .01 or .05 
and rejecting the hypothesis 6 = 6 if a suitable statistic Y exceeds C where 
Ps,{¥ > C} = a@. Such a procedure controls the probability of false rejection 
(error of the first kind) at the desired level a but leaves the power of the test and 
hence the probability of an error of the second kind to the mercy of the experi- 
ment. It seems more natural when deciding on a significance level (and this 
suggestion is certainly not new) to take into account also what power can be 
achieved with the given experiment. In Section 3 a specific suggestion will be 
made as to how to balance a against the power 8 obtainable against the alterna- 
tives of interest. 

rhe adoption of this or some similar rule ‘or choosing a significance level has 
important consequences for the theory oi testing composite hypotheses, where 
nuisance parameters are present. Since the quantity @ is then potentially a 
function of the nuisance parameter J, the classical rule of a fixed significance 
level leads to the condition that the tests be exact or similar, that is, that a(#) 
equal the preassigned value @ for all 3. On the other hand, the power 6 that 
can be attained against any alternative 6 = 6, frequently depends on 3. The 
requirement that a(#) and 6(3) be in a certain balance thus leads to tests which 
are not similar and hence do not agree with the standard solutions. 

To obtain a suitable setting for this discussion, we consider first a minimal 
complete class of tests for testing the hypothesis H:@ S 6 in a multiparameter 
exponential family (Section 2). The proposed a, 8-relation is discussed in Section 
5, and in Section 4 is applied to the exponential family. Section 5 gives some 
illustrations of the theory. 


2. A complete class theorem. Many standard testing problems concern an 
exponential family of distributions, which has probability densities of the form 


(1) Pew(x) = C(6, 3) exp [ eu) + > dv; 1.0) | h(x) 

i=l 
with respect to a o-finite measure u, where 6, U, the 3; and 7; are real-valued 
and where 3 = (3, --- ,90,). In this family, the statistics U and T = 
(T,, --- , T,) constitute a set of sufficient statistics for (6, 3). 


The problem of testing the hypothesis 17:6 < 6) against the one-sided al- 
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ternatives 6 > @ has been treated by many authors (usually in the formulation 
6 = 6 against @ > 6). The solution of this testing problem according to the 
Neyman-Pearson theory is the uniformly most powerful unbiased test; this 
depends only on U and T and is given by the critical function’ 


| itau> C@), 
(2) dtu,t)= 4 y(t) if u = C(0), 
| Oifu < Cid), 


where the functions C and y are determined by the conditions /,{¢(U, 7) | 7 
t]} = a and £,,[Ud(U, T)| T = t| = a Fy {U | T = ¢ for all t. The condition 
of unbiasedness 


Ko.06(U, T) 


- <= 
= a as = &. 
- — 


and that of similarity 
Ko,,.06(U, T) =a for all 3 


which it implies and which by itself is sufficient to justify the test, are not in- 
herent in the problem but are imposed, at least in part, to facilitate the solution. 
Before proposing an alternative approach, it is interesting to see how far the 
problem can be reduced without the introduction of extraneous principles. This 
can be done by viewing it within the framework of decision theory. 

Let do and d,; denote the decisions of accepting and rejecting the hypothesis H, 
and denote by L;(@, 3) the loss resulting from decision d; when (6, 3) are the true 
parameter values. Then for fixed J, the function Lo(6, 3) typically will be zero 
for 6 < 6) and increasing for 6 = 4, while L,(@, 3) will be decreasing for @ < 4 


> 


and zero for @ = 6). In particular, the difference then satisfies 


(3) L,(6, 8) — L,(@, 9) = 0 as 6 


— 


A . 


VIIA 


The risk function of a test ¢, which is the expected loss resulting from its use 
considered as a function of the parameters, is 


R.(0, 9) = | (o(U(x), T(x) L4(6, 8) 
+ [1 — ¢(U(x), T(x))]Lo(6, 9) | pe.o(x) dy(x) 


(4) 


Let @ be the class of all tests satisfying (2) for some functions C and y. For 
all loss functions satisfying (3) it was shown by ‘Truax [13] that @ is essentially 
complete; that is, given any ¢ there exists g’ ¢ @ such that 


(5) R,- (6,38) < R(0,38) —_ forall (6, 9). 


We shall now prove that among essentially complete classes, C is minimal in 


the sense that if (5) holds for two tests ¢, ¢’ in @, then g = ¢’ . 


a8. wh. 
2 See for example [7]. 
* Recently I learned that this result has been obtained also by D. L. Burkholder. His 
results are sketched in Abstract 18, Ann. Math. Stat., Vol. 29 (1958), p. 616. 
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Let ¢ and ¢g’ belong to @ and let 
(6) ald) = Eo, 0 ¢(U, T), a’(d) = Ey,ag’(U, 7). 


(i) If the functions a and a’ do not agree for all 3, suppose without loss of gen- 
erality that there exists J) such that a(d) < a’(d). Since for i = dy , the ex- 
pected values of g and ¢g’ are continuous functions of 6, there exist 0. < 0 < 
such that 


(7 Eo.0¢(U,T) < Eo.9,)e’(U, T) for 6 = 6,and 6 = @. 


Then Ry(0,, do) < Rg (6; , do) and Rz(6,, 80) > Ry (G2, Jo), and hence neither 
of the procedures ¢ and ¢’ is uniformly better than the other. (ii) Suppose on 
the other hand that a(#) a’(3). The standard proof showing a similar test 
satisfying (2) to be uniformly most powerful similar also shows that a test @o 
satisfying (2) and 


s Ks,.0o(U, 7) = ald) for all 3 


is uniformly most powerful among all tests satisfying (8). The tests @ and ¢’ are 
therefore both uniformly most powerful within this class and hence 


Eo,9 0(U, T) E..0¢'(U, T) for all 6 > 6) and all J. 


Since the family of distributions of the sufficient statistics (U’, 7) is complete, 
it follows that o(u, t) = $’(u, f) a.e., as was to be proved. 


3. Significance level and power. It follows from the result of the preceding 
section that the class @ of tests (2) represents the maximum reduction that can 
be achieved by comparing only tests of which one has a uniformly better risk 
function than the other. The selection of a specific test from @, involves two 
difficulties. It requires the adoption of some principle (Bayes, minimax, etc.) 
leading to a definite choice; in addition, it requires knowledge of the loss func- 
tions Ly and 1. An alternative approach, utilizing the fortunate circumstance 
that the complete class is independent of the actual loss functions (subject only 
to their satisfying (3)), consists in making the choice by some simple rule of 
thumb, which does not require (the usually unavailable) knowledge of these 
losses, 

Consider the simplest case of the family (1) with r = 0, which involves no 
nuisance parameters. The family of tests (2) is then a one-parameter family, one 
test corresponding to each value of 


E. 


\ simple method of choice consists in specifying a value of ap and selecting the 
test corresponding to this value. This need not be a purely formal or arbitrary 


ag hs,o(X), 0 s ao 


IA 


’ Particular proposals of this kind that have been made in the literature include those 
of Jeffreys [5] involving considerations of a priori probabilities, and of Lindley [8] based on 
his concept of unlikelihood 
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procedure since ao as the maximum probability of false rejection is of course an 
important quantity in its own right. 

Nevertheless, as was pointed out in Section 1, the above rule appears to neglect 
too many aspects of the problem. In particular, suppose that the alternatives of 
primary interest, for which it is important to reject the hypothesis, are those 
satisfying 6 = 6; (0 < 6,). Since the power function of any test (2) is increasing 
in 6, the probability 8; of rejection when @ = @, is the minimum power against 
these alternatives. It seems then reasonable that the choice of test should involve 
at least 8; in addition to ao . 

The quantities ao and a; 1 — 8, are the error probabilities associated with 
the problem of testing the simple hypothesis 6 = 4 against the simple alterna- 
tive 6 = 6,. The attainable pairs (a , a;) form a convex set, the lower boundary 
of which corresponds to the admissible tests (2). This lower boundary is a convex 
curve S connecting the points (0, 1) and (1, 0), and what is needed is a reasonable 
way of selecting a point on each such curve. One possible approach to this ques- 
tion is in terms of indifference curves. Suppose that a system of curves could be 
specified in the (ao , a;)-plane such that any two points lying on the same curve 
are equally desirable, with the curves closer to the origin being more desirable 
than those further away. The optimum test would then be given by that point 
of S lying on the indifference curve closest to the origin (Hig. 1). 

It seems likely that even this approach is too complex for most applications. 
To obtain an even simpler formulation, consider once more the rule of fixing the 
significance level without regard to power. If the level is a, this means restricting 


attention to the points (ao, a) lying on the vertical line segment Lia) = a, 
0 < a = 1 — a. The test then corresponds to the point (a , a), which is the 
1 
| 
a, 
INDIFFERENCE 
CURVES —~ 








Fig. 1 
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intersection of S and L. This procedure is commonly justified on the grounds 
that the error of the first kind is of a higher order of importance, and should 
therefore be controlled at the prescribed level. However, if the curve S is suf- 
ficiently close to the ao- and aj-axis, as will always be the case if the sample size 
is sufficiently large, then a; is much smaller than ap , which is inconsistent with 
the assumed relative importance of the two errors. 

A more reasonable solution is obtained if one replaces the vertical line segment 
L by a curve Cra, = f(ao) where f is a continuous strictly increasing function 
with f(0) = 0. A particularly simple choice for f is a linear function 


(9 a, = kao. 


Since ay S 1 — a for all admissible tests, one has ay S 1/(k + 1) so that 
1/(k + 1) is an upper bound for ao. As an example, consider (9) with k = 9. 
If p, 1 — a denotes the power of a test against the alternative 6, , some typical 
pairs of values of (ao , 81) are 


ay a 05 04 03 O02 .O1 005 
By sa 66 . 64 73 82 oO) 955 


with .1 being an upper bound for ag . 

One would of course hope to avoid cases such as ay = .1, 8; = .1 or even 
ay = .05, 3; = .55. When no nuisance parameters are present, this can be achieved 
by taking a sample of sufficient size. In the composite case, on the other hand, 
it can frequently not be achieved by samples of fixed size no matter how large, 
but oniy by resorting to sequential experimentation. 

To avoid misunderstandings, it should be emphasized that (9) is not being 
proposed as a logically convincing rule, nor as one fitting all occasions. Actually, 
it seems clear that no rule satisfying these requirements exists, except the Bayes 
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solution when sufficient knowledge concerning losses and a priort probabilities is 
available. In the absence of this knowledge it may be convenient to employ # 
simple rule of thumb. Such a rule is in fact being used in much of present practice : 
It consists in choosing a@ to be .05 or .01 depending on the seriousness attached 
to the committing of an error of the first kind. To this, (9) is suggested as an 
alternative which appears to be more reasonable in many cases. 

It so happens that (9) is the minimax solution if the loss for rejecting //:4 < 4, 
is a) When H is true, and the loss is a; for accepting H when @ 2 6, where the 
constant & of (9) is then given by k = ao/a,. However, this is not the basis for 
the present suggestion of (9), and the minimax property does not carry over to 
the application to be made in the next section to composite hypotheses. 


4. Conditional tests. We return now to the composite case of the exponential 
family (1) with r > 0. The minimal complete class @ is then more complex than 
in the preceding section, its members being characterized by the function a(d) 
instead of the single number a». Given any function a(d), which is the expecta 
tion of some critical function ¢, there exists a unique member of @ whose ex- 
pectation function for @ = 6 is also a(#). This uniformly minimizes the risk (and 
maximizes the power) among all critical functions having this expectation. 

If the alternatives of interest are as before those satisfying 6'2 6, let s(W) 
denote the power function of a test against the alternative (6, , 3). The proposal 
made in the preceding section suggests selecting that member of @ which satisfies 


(10) 1 -- B(d) = kal) for all @. 


However, this relationship depends on the particular parametrization chosen, 
and we shall not discuss it here. Instead an alternative approach will be proposed 
in which this difficulty does not arise. 

Consider once more the case of the similar test with a(d) = a@. Since T ts a 
complete sufficient statistic for d when @ = 4, the funetions C and y of (2) are 
determined by the requirement that the conditional probability of rejection 


a*(t) = Po {U > C(O) th + yvOPo,{U = C(t) | t} 


be equal to aw for all £° However, the conditional power 8*(t) = Po, {rejecting 
H \t} of the test against the alternative @ 6, , typically depends on ¢. The 
question then arises: Suppose that B*(4) is quite small for the observed f, or 
quite high; is this value not more relevant to the case in hand than the average 
value B(d)? 

Without entering into the difficulties raised by this question, there is an 
alternative and simpler justification for considering 6*(t). The actual power 8 
against the alternative 6 = 6, generally depends on the nuisance parameter J 
and is therefore unknown. It can however be estimated from the observations, 


* This method of constructing exact tests was originated by Bartlett [1] and Neyman [{9| 
That in the present case it provides the totality of such tests has been noted by many 
authors. For a recent discussion and references see [7] 
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and 6*(7') is the unbiased estimate with (uniformly) minimum variance. That it 
is unbiased is clear since B(8) = 26, 68*(7T). The minimum variance property 
is an immediate consequence of the completeness of the sufficient statistie 7 for 
(0, , #) and of Theorem 5.1 of [7]. 

Analogous remarks apply in the more general case, in which the tests are not 
required to be exact. If the relevant frame of reference is obtained by considering 
t as fixed, the error probabilities of interest are the conditional probabilities 
ag (t) = Ps, (rejecting H | t) and ai(t) = Ps, (accepting H | t), and the quantities 
C(t) and y(t) can therefore be determined from the relation 


(11) ‘ at(t) = kao (t). 


The resulting test will of course not be similar. However, since ag (t) < 
1/(k + 1) forall ¢, the quantity 1/(/ + 1) isan upper bound also for the average 
probability ao(d) of an error of the first kind. 

The above discussion applies only to problems in which the parameter of 
interest is one of the “natural” parameters of the exponential distribution (1). 
As was pointed out in [7], any parameter of the form @ + > a,0; is natural for a 
suitable definition of U’, the 7’s and 8’s. When the parameter of interest is not 
of this form, related methods may be applicable as is indicated by the following 
example. 

If X,, --- , X, area sample from a normal distribution N(é, o°), neither the 
parameter £ nor £/o are of this form. The problem of testing £/o¢ S 6» against 
t/o => 6 can be reduced by invariance considerations to the statistic 
xX [docy, — X)*|', the distribution of which depends on the single parameter 
6 = t/e. Ifa; = Ps, {X >C| DX, — X)*}'}, the quantity C can be determined 
so that a; = kay. The problem of testing & S £& against & = & appears to be 
more difficult; a possible approach may be that of [4], Section 3. 


5. Examples. We shall now briefly indicate some examples in which the 
natural parameter @ is the relevant one so that the method of the preceding 
sections is applicable. Of these, Examples 1, 2, 3 have been treated by the same 
method (but from a different point of view) by Tocher [12], and Examples 2, 3 by 
Sverdrup [11]. 

ExampLe 1. Let X, Y be independent Poisson variables with E(X) = A, 
E(Y) = uw, and consider the problem of testing u/A S ap against w/A = ay 
The joint distribution of X, Y forms an exponential family with T = X + } 
U = Y, 6 = log(u/A) and & = log X. The conditional distribution of Y given 
X + Y = tis a binomial distribution corresponding to the success probability 
p = w/(A + ») and number of trials equal to t. In terms of p, the hypothesis 
and class of alternatives becomes pS ao/(ao + 1) and p 2 a,/(a, + 1) so 
that the test satisfying (2) and (11) can be determined from a table of the 
binomial distribution, 

ExampLe 2. If X, Y are independent variables with binomial distributions 
b(p, , m) and b(p., n), their joint distribution has the exponential form (1) with 
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T=NX+ Y, U = Y, @ = log (p2/qe. + m/qi) and 8 = log (p:/q). The 
method is therefore applicable to the problem of testing po/q. S ao(pi/qi), and 
in particular p, S p, by letting ao = 1, against the alternatives po/q 2 
a(piq). Putting p (po/qe) + (ri/q), the conditional distribution of Y 
given f 1s 


ite ; haa ; a m ae 
(12) PAY =ylX+Y = 2} - C0) ( )(") 0’ y=0,1,---,1, 
t—y/ \y 
which for p = 1 reduces to the hypergeometric distribution. 
EXAMPLE 3. In a2 X 2 table representing the results of classifying a sample of 


size s according to two characteristics A and B, the joint distribution of the 
numbers X, Y, Y’ in the 


B X x M 
B y ee 
T 7 S 
categories AB, AB and AB constitute an exponential family with U Y; 


T,;= N+ Y,T. = Y + Y' andé@ = log (panpis/paspin). Putting A = (paspis 
Parpik) one finds 


4 1—A _ ‘ 1 1-—A 
dan = Dad Dae DAB; 045 = DiDdB - Dan Dak 
Par Pa Ps A PapPak Pak Pa Ps A Pau Pai 
1-—-A |. - ; 1—A 

4s = nh Dip DaB: = DiDe — 5a Dab 

Pai Papi A Pan Pak Pas Pap A PasPak 
where p4x denotes the probability of having the characteristics A and B, py 
Paz + pag the probability of having the characteristic A, etc. The quantity A is 
therefore a measure of the degree of dependence,’ A = 1 corresponding to 


independence, A < 1 to negative and A > 1 to positive dependence. The method 
of the preceding section is applicable to testing A < 1 or more generally A S Ap 
against the alternatives A = A,. The conditional distribution of Y given 
X+ Y=t,Y + Y’ = nis given by (12) with A in place of p. 

IxXAMPLE 4. Consider a number of paired comparisons ((°;, Vy) where only 
the sign of the differences W, = V, — U;, are observed for each pair k = 1,--+,n. 
If the probability of a positive, negative and zero observation are py , p— and po 
in each case and if the comparisons are independent, the joint distribution of the 
numbers X, Y and Z of positive, negative and zero cases is the multinomial 
distribution 

' 


, 2 
alytz! a? 


> \ is equivalent to Yule’s measure of association, which is Q = (1 A)/(1 + A). Fora 
discussion of this and related measures, see [2]. 
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This is an exponential family with U = Y, T = Z, 6 = log (p,/p_) and d = 
log (po/p_). The test of py S p_ (or py S aop_) against py 2 ayp_ is therefore 


performed conditionally given Z = ¢. Since the conditional distribution of Y 
given Z = tis the binomial distribution b(p,/(p, + p_), n — t), the constants 
C(t) and y(t) for which the test satisfies (2) and (11) can be obtained from the 
binomial tables.° 

EXaMPLe 5. Let Y,,--- , Yw be independently distributed according to the 
binomial distributions b(p; , n;) 7 = 1, --- , N where 


p= Wii+er™’) 


This is the model frequently assumed in bioassay, where x; denotes the dose or 
some function of the dose such as its logarithm, of a drug given to n; experi- 
mental subjects and where Y,; is the number among these subjects which respond 
to the drug at level x; . Here the z; are known, and @ and 8 are unknown param- 
eters. The joint distribution of the Y’s is 


(13) otvt8tein TT ("’) jvlatbeg) Ins 
| i=1 Yi 1 + a (a+Bxi ’ 


which is an exponential family with the parameters a, 8 and sufficient statistics 
> Y,, Dor... The method is therefore applicable to testing a S ay against 
a 2a, orf S By against 8 2 6, . It is interesting to note that for the particular 
case x, = tec and H:8 <S 0, the conditional test given Y = ¢ is a form of the 
Wilcoxon test in a setting similar to that discussed by Haldane and Smith [3]. 

Asa last example we mention without going into details the comparison of two 
distributions of type (13). If the parameters in these are a, 8 and a’, 8’ the dif- 
ferences a’ — a and ps’ — £8 are natural parameters of the resulting exponential 
families, and can therefore be tested by the method discussed here. 
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STEP-DOWN PROCEDURE IN MULTIVARIATE ANALYSIS’ 
By J. Roy 
University of North Carolina 

1. Introduction and summary. Test criteria for (i) multivariate analysis of 
variance, (ii) comparison of variance-covariance matrices, and (iii) multiple 
independence of groups of variates when the parent population is multivariate 
normal are usually derived either from the likelihood-ratio principle [6] or from the 
“union-intersection” principle {2}. An alternative procedure, called the “‘step- 
down” procedure, has been recently used by Roy and Bargmann [5] in devising 
a test for problem (iii). In this paper the step-down procedure is applied to 
problems (i) and (ii) in deriving new tests of significance and simultaneous 
confidence-bounds on a number of ‘‘deviation-parameters.”’ 

The essential point of the step-down procedure in multivariate analysis is 
that the variates are supposed to be arranged in descending order of importance. 
The hypothesis concerning the multivariate distribution is then decomposed into 
a number of hypotheses—the first hypothesis concerning the marginal uni- 
variate distribution of the first variate, the second hypothesis concerning the 
conditional univariate distribution of the second variate given the first variate, 
the third hypothesis concerning the conditional univariate distribution of the 
third variate given the first two variates, and so on. For each of these component 
hypotheses concerning univariate distributions, well known test procedures 
with good properties are usually available, and these are made use of in testing 
the compound hypothesis on the multivariate distribution. The compound 
hypothesis is accepted if and only if each of the univariate hypotheses are ac- 
cepted. It so turns out that the component univariate tests are independent, if 
the compound hypothesis is true. It is therefore possible to determine the level 
of significance of the compound test in terms of the levels of significance of the 
component univariate tests and to derive simultaneous confidence-bounds on 
certain meaningful parametric functions on the lines of [3] and [4]. 

The step-down procedure obviously is not invariant under a permutation of 
the variates and should be used only when the variates can be arranged on a 
priori grounds. Some advantages of the step-down procedure are (i) the procedure 
uses widely known statistics like the variance-ratio, (ii) the test is carried out in 
successive stages and if significance is established at a certain stage, one can 
stop at that stage and no further computations are needed, and (iii) it leads to 
simultaneous confidence-bounds on certain meaningful parametric functions. 

1.1 Notations. The operator & applied to a matrix of random variables is used 
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to generate the matrix of expected values of the corresponding random variables. 
The form of a matrix is denoted by a subscript; thus A,, ~ , indicates that the 
matrix A has n rows and m columns. The maximum latent root of a square 
matrix B is denoted by Amax(B). Given a vector a = (a;, a, «+: , a)’ anda 
subset 7’ of the natural numbers 1, 2, --- , t, say 7’ = (j1, jo, -** ,Ju) Where 
ji < jo < +++ ju, the notation 7a} will be used to denote the positive quantity: 


T\a| - - (a), + a}, + me + aj,}"”. 


T [a] will be called the T-norm of a. Similarly, given a matrix B, . ,, we shall 
write Bir) for the wu X u submatrix formed by taking the j;th, jeth, «+--+, juth 
rows and columns of B. We shall call By7x) the 7-submatrix of B. 


2. Step-down procedure in multivariate analysis of variance. 

2.1 General linear hypothesis in univariate analysis. Let the elements of Yn <1 
be one-dimensional random variables distributed independently and normally 
with the same variance o° and expectations given by 


(1) Sy = A6+ XB 


where elements of 6, 1 and 6, x 1 are unknown parameters; A, y », and XY, y, are 
matrices of known constants with rank (A) = rand rank (A:X) = r + q, with 
n>(r+q). 

A set of ¢ linearly independent linear functions ¢@ x 1 = Bi yx »4, where B is a 
given matrix of rank ¢, is said to be estimable if for each element of ¢ there exists 
an unbiassed estimate linear in y, for all values of 6 and ~. If @ is estimable, 
there exists an estimator ¢; x 1 of ¢, the elements of which are linear in y and 
minimum variance unbiassed estimators of the corresponding elements in ¢. 
Denote the variance-covariance matrix of ¢by C-o°,where C; x , isa positive-defi- 
nite matrix. Let s’/(n — q — r) denote the usual error mean square with (n — q — r) 
degrees of freedom giving an unbiassed estimator of o°. Then it is well known 
that the statistics u = (6 — $)’C (6 — ¢)/o and v = s°/o° are distributed 
independently aschi-squares with ¢ and (n — q — r) degrees of freedom respectively, 
so that 


. wens / 
#/(n — q— 1) 

is distributed as a variance-ratio with t and (n — q — r) degrees of freedom. 

Let a be a preassigned constant, 0 < a < 1, and f the upper 100@ per cent 
point of the variance-ratio distribution with t and (n — q — r) degrees of freedom. 
Setting 2° = tf/(n — q — 1) we then have 
(3) (6 — ¢)'C (6 — ) S ts" 
with probability (1 — a). 

Now, the left-hand side of (3) is a positive definite quadratic form in (@ — ¢) 


and consequently, we have 


(4) (6 — )'C (db — o) = (6 — $)'(b — )/Rinax(C). 
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We thus have 


(5) (¢ — ¢)'(d — ¢) S Fs" Xmax(C) 
with probability not less than (1 — a). 


Now, let 7 be any subset of the natural numbers 1, 2, --- , ¢ and consider the 
T-norms 7T'¢| of @ and T[¢] of é. Then (3) implies that 


(6) T |] — ts Me(Com) S Tle] S T1d] + fs Aax(Ccr)) 


for all subsets 7' of (1, 2, --- , 4), where Cr is the T7-submatrix of C. Thestatement 
(6) thus provides simultaneous confidence-bounds on the parameters 79] for all 
T with probability not less than (1 — a). We note that there are in all (2‘ — 1) 
parameters of the type 7'[¢] and these in a sense measure the deviations from 
the hypothesis 3% that @ = 0. The analysis of variance test for Hy at level of 
significance a, of course, is given by the rule 


segvn7l 2 7 
accept 5p if - ecw i &- 
(7) s*/(n — q — 1) 


otherwise reject Hy. 


However, simultaneous confidence-bounds of the type (6) are more interesting 
than the test (7) itself, because the direction of departure from the null hypothesis 
is indicated. 

2.2 Customary tests in multivariate analysis of variance. We have a matrix 
Y, x» Of random variables, such that the rows are distributed independently, 
each row having a p-variate normal distribution with the same variance-covar- 
iance matrix 2, , » Which is positive-definite. The expected values are given by 


(8) &Y = A@, 
where A, x » is a matrix of known constants of rank r,r S (n — p), and On x » 
is a matrix of unknown parameters. As before, a set of linear parametric functions 
%, y > = Bi y »O is said to be estimable if, for all 0, there exist unbiassed estimates 
of @ linear in Y. If ® is estimable, customary tests for the hypothesis 

Ho: ® = 0 
are based on two p X p matrices of random variables 


(9) S, = Y’EY and S, = Y’HY, 


called respectively the sum of products matrix due to error and the sum of 
products matrix due to hypothesis. Here E and H are n X n symmetric idempo- 
tent matrices with non-stochastic elements, EF of rank (n — r) and H of rank ¢, 
E being a function of A, and H of both A and B. The likelihood-ratio test [6] is 
| S.| 


accept Kp if L = ——————. > ¢, 
(10) , Se + Sn | 


otherwise reject 3o, 
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where c is a preassigned constant depending on the level of significance. The test 
based on the largest latent root [3] is 


accept Ro if Amax( Sp S,") < d, 
(11) 


otherwise reject 5p , 


where d is a constant depending on the level of significance. Simultaneous con- 
fidence-bounds on certain meaningful parametric functions have been derived 
by the largest (or the largest-smallest roots) procedure, [3] [4], whereas no such 
bounds are available as of now from the likelihood-ratio procedure. 

2.3 The step-down procedure. We shall denote the ith columns of the matrices Y 
and © in section 2.2 by y; and 6; respectively and write Y; = [#4 ye --> y:j and 
0; = [0; 6 --- 6). Further, we shall denote the top left-hand 7 & 7 submatrix 
of © = ((0i;)) by 3. 

Then, under the condition that Y; is fixed, the n elements of the vector 4,4, 
are distributed independently and normally each with the same variance o; 
and expectations given by 


(12) EY ia Anis + YB; , 


where 8; is a vector of the form 7 & 1 given by 


ae 
(13) 2; = wl O2,i+1 


: 3 ; Bo = 0, 
-~ 


amd 7;4; is a vector of the form m X | given by 


(14) Ni+1 Bi 44 — 0,6 
and 
, | * 1 
(15) Oi41 = ’ 
with the understanding that |X» 1 so that oi = on , i = 0,1, 2,-+- , (p 1). 


The elements of the vectors 8; , 7:4; may then be regarded as unknown param- 
eters. We shall call 8; the ith order step-down regression coefficient and a; ,, the 
ith order step-down residual variance. 

Let us now consider linear functions 


~ 


(16) od Bn; (i ta, *** 5p. 


If Y; is fixed, (12) is of the same form as (1). Let us now, with an easily under- 


stood notation similar to that used in Section 2.1, construct the statistics 


, (d; ll ios b, al it 
(17) v.- ? : oi) (di i)"/ 
si/n —r—1t+ 1) 


to 


, p). 
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Obviously, when Y,_; is fixed, the statistic F; is distributed as a variance ratio 
with ¢and (n — r — i+ 1) degrees of freedom (i = 2, 3, --- , p). Finally, we note 
that in its functional form /; involves only Y; (i = 1,2, +--+ , p and that the 
conditional distribution of /’; , given Y;_, does not involve Y;_; (¢ = 2,3, --- , p) 
and hence Fy, ,--- , F). Also, Ff) is marginally distributed as a variance-ratio 
with ¢and (n — r) degrees of freedom. Therefore the statistics /) , Fy, --- , F, 
are independent. This can be verified in a straight-forward manner by using the 
transformation to rectangular coordinates as in [5] or any other set of step-down 
variates, or even otherwise. 

lor a preassigned constant a; ,0 <a; < 1, let f; denote the upper 100a,; per 
cent point of the variance-ratio distribution with t and (n — r — 7 + 1) degrees 
of freedom. Then the probability 7? that simultaneously 


(18) ee i hy D, 
ix given by 
(19) P [la — a;). 
tel 
Therefore, for any subset 7 of the natural numbers 1, 2, --- , ¢ writing as in (6), 


T\,\ and T{¢,| for the T-norms of ¢@, and ¢,; respectively, and setting 
(20) ¢ “if, (m—r—-t+1) 


and writing C,,7, for the 7-submatrix of C; , we have the simultaneous confidence 
bounds 


(21) T1d,) — C8) Amax(Carm) S Tod S Tibi) + bis Amex (Cry) 


for all subsets T of (1, 2, +--+ , () and 7 1,2, --+ , p with probability greater 
than 2. 

To derive a test of the hypothesis 3p that ® = 0, we note that Jy is true if 
. and only if the hypothesis 5c; that ¢; = 0 holds for all 7 = 1, 2, --- , p. Using 
the result (17), we set up the following procedure for testing 3p : 


‘aye 


= o,C; o;/l . ; , : 
accept Jp if uy . ; < ti for alla = 1,2,--- ,p; 
si (n—-r—i+)) 

(22) 

otherwise reject 3c) 
Obviously, the level of significance for this test is 1 — P where P is given by 
(19). The arbitrariness in determining the f,’s when the level of significance is 
preassigned may be removed by stipulating that a1 = a, = --- = a,. From 


the fact that the variance-ratio test (7) is uniformly unbiassed, it can be seen 
after a little consideration, that the test procedure (22) is also uniformly un- 
biassed. 

To carry out the test one should first compute uw. If wu, > fi, Ho is rejected 
and no further computations are needed. If u, S fi , the next step is to compute 
us. Tf ue > fe , Ko is rejected and no further computations are needed. If vw. S fe 
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one proceeds to compute u; and so on. This way one need compute u; if and only 
ifu; < f;forg = 1,2, +--+ ,7 — 1. Much computational labor is saved thereby. 

It is well known that the likelihood-ratio statistic L given by (10) can be 
expressed as 


(23) pu t-s528 

init + (n—r—itt lu, 
and this has been utilized [{1| to obtain the moments of L when Jey is true. How- 
ever, the step-down procedure based on the individual u,’s rather than on a 
single function L, is advantageous from the point of view of (i) setting up simul- 
taneous confidence bounds and (ii) saving computational labor, specially in the 
situation indicated in the introduction. 


3. Step-down procedure for variance-covariance matrices. Let S, y » = 
((s;;)) be a symmetric matrix of random variables, distributed in Wishart’s form 
with n degrees of freedom, n > p, so that S/n provides an unbiassed estimate for 
the variance-covariance matrix » of a p-variate normal population. In the same 
way as in Section 2.3, we shall write S; for the 7 & 7 top left-hand submatrix 
of S and let 


$1, i+1 
Se, i4 
(24) ii « e454. by = 0, 
85,41 
x 2 1 Sua] 2 
(25) Si4n1 = y ’ “a = Gn, 
fori = 1, 2, ---,p—1. Let B:-; and a; be defined by (13) and (15) for 2 


1, 2,---, p. Then it is well known that when S;, is fixed, the distribution of b; 
is independent of the distribution of si4, ; the distribution of b,; is i-variate 
normal with expectation 8; and variance-covariance matrix ¢,., S;', and 
8541/0641 has the chi-square distribution with (n — 7) degrees of freedom, 
i = 1,2, ---, (p — 1). Finally sj/o; has the chi-square distribution with 
degrees of freedom. 

When more than one variance-covariance matrix is involved, we shall dis- 
tinguish them by a superscript under parentheses. Thus with a number of popu- 
lation variance-covariance matrices Y” and the corresponding Wishart matrices 
S” the quantities Be? ob, s, ete., will be defined in the same Way as in 
(13), (15), (24), and (25) for 7 1,2, --- , ete. 

3.1 One variance-covariance matrix. On the basis of a matrix S distributed in 
Wishart’s form with n degrees of freedom, with S/n providing an unbiassed 
estimate for 2, it is possible to set up simultaneous confidence-bounds on param- 
eters which are functions of the elements of = by the step-down procedure as 
follows. 


When S; is fixed, the statistics u = (b; — B;)'S; (b; — B)/oi., and vr 
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Si4i/oi41 are distributed independently as chi-squares, u with i degrees of 
freedom and v with n — 7 degrees of freedom. Therefore, given pre-assigned 
positive constants a; , ei, , and di4, , where e;4; < di4,, the probability P,;,, that 


. . 
(b; — B,)' Sib; — B;)/8i41 SQ, 
(26) : . 
Cinn S 8541/0541 S digs 


holds for fixed S;, is a constant depending only on n, 7, a; , ¢i4,, and dj.,. As 
nu matter of fact, 


ad; +1 
(27) Pi = [ Ga; x)gn—(ax) dx @ = 1,2, ---,p— 1), 
where 
(28) G,(z) = [ g(&) dé 
and 
e 7. 1 
(29) G(x) : 


2”T (dy) 


\lso, given preassigned positive constants }; , (bd), < c,), the marginal prob- 
ability 7; that 


(30 C; S 8\/o; Sd, 


is given by 


1; 
(31) P, / a,(xr) dv. 


“ey 


> 


By an argument similar to that which follows (17) in section 2.3, we obtain 
the probability ? that simultaneously 


Now, as in Section 2.3, for a given subset 7’; of the integers 1, 2, --- , 7, writing 
T{8;| and 7,{b,) for the T,-norms of 8; and b; respectively, and writing Uj:7,) for 
the 7'-submatrix of S;' 


9 
’ 


8i/d; S o; 


IIA 


8;/C; fora = 1,2,---,p, 


Tb) — asigdmeax(Ury) S$ TAB) S Tol + asin AW (Uirs) 
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for all subsets 7’; of (1, 2,--- , 7) and7 = 1,2,---,p— 1. Thestatement (33) thus 
provides simultaneous confidence-bounds on p parameters of the type o; and 
(2” — p) parameters of the form 7';{8;| with probability not less than P. 

It is to be noted that to set up simultaneous confidence bounds of the type 
(32), one has to evaluate the integral (27) which is not usually available in 
tabulated form. Another meaningful procedure, which, incidentally, avoids this 
difficulty, is to set up separate sets of simultaneous confidence bounds: one on 
oi, °*:,¢,, using the chi-square distribution for si/o;, with a preassigned 
probability and another set on the step-down regressions 8; , using the variance- 
ratio distribution for (b; — B,)’Si(b; — 8))/sia,, and with a probability not 
less than a preassigned level. 

We suggest a slightly different procedure for testing the hypothesis 3) that = 
has a specified value XY». This hypothesis may be reformulated in terms of the 
step-down regression-coefficients and residual variances as follows: the hypothe- 
sis 5p is true if and only if each of the hypotheses 


Ria t0¢ = oc, t= 1,2,---,p, 
5io 2B; = Bao, i=1,2,---,p—1, 
is true, where oi , 840 are derived from Xy the same way as o; , 8; are derived from 
>. The test procedure suggested is: 
accept Jo if 
c; S si/oio Sd, (i = 1,2,--- 
(bi — Bw)’ Sibi — Bw)/oio Se (6 = 1,2,---,p— 1); 
otherwise reject Io . 


The level of significance a for this procedure is given by 
hin pl 
(35) a =1 {IT eh {iT et}, 
tel J i=l ) 
where 


dy 
, 
P,; = | Gn—iyi(x) der, 

es 

” » 42 

P; = G (e;). 
For a given a, the c; , d; , e;’s are not uniquely determined. The arbitrariness may 
be removed, for instance, by the further stipulation that 


Pi =P, = --» =P, = Pi = P2 = --- = Ph = B (say) 


and that (c;, d;) are the locally unbiassed partitioning of the 100 (1 — 8) per 
cent critical region based on the chi-square distribution with n — 7 + 1 degrees 
of freedom. With this choice of the constants ¢; , d;, e;, the test procedure is 
locally unbiassed. 

3.2 Two variance-covariance matrices. With two population variance-covariance 
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matrices ©”, =” and two matrices of random variables S“’, S” distributed 
independently in Wishart’s form with mn, and ne degrees of freedom respectively, 
so that S”’ /n; provides an unbiassed estimate for =, we can use the step-down 
procedure for testing the hypothesis 3) that the two variance-covariance matrices 
are identical or, in symbols, 


an «ed 
Ko 2 _ 


and also set up simultaneous confidence bounds for parameters measuring 
deviations from 5p. 

Let us introduce the two sets of step-down regression-coefficients and residual 
variances: Bp, o;’, b;”, and s°”. The hypothesis 3) may be reformulated in 
terms of the step-down parameters as follows: 3Co is true if and only if the hy- 
potheses 


ae () (2) ° : 
Ki to, Cis i= i. Zz °? a 
(36) : . 
Hie >B; Bi . ‘i= A, 2, 1? i 
y ),_(@ m ; 
are simultaneously true. We may take p; = o;°/o;’ and 7'{6,;) as measures of 
. . 7 a0 (1 (2) "ny e ‘ . 
deviation from Ho where 6; = B;’ — 8; , 7; is a subset of (1, 2,--- , 7) and 


T |6,| denotes the T;-norm of 6; . In this case, it has not been possible to set-up 
confidence bounds on all these parameters simultaneously. However, one may 
proceed as follows. Given pre-assigned positive constants ¢; , d; ; ¢; < d;, and 
writing 


° —1/2 
ny — 1), 
(37) ri = ; - ) 8; /8;", 
Ny —-7t+ 1 


we find the probability that 
(38) r. di; Sp. S1r.e,, ¢= 1,2, --- ,p, 


should hold simultaneously is given by 


(39) P=]IP,, 
il 
where 
d; 
(40) Pi = | dF R~i41(2) , 


in which /”(x) stands for the distribution-function of the variance-ratio statistic 
with m degrees of freedom for the numerator and n degrees of freedom for the 
denominator. Therefore, (38) provides simultaneous confidence-bounds on p; 
(¢ = 1,2, --- , p) with probability P. 

Let us now write , = bS” — b® and note that if S! and S* are fixed, é; is 
distributed in an 7-variate normal form with expected value 6; and variance- 
covariance matrix 


¢ (1) 924 g())-1 . 
t(Fitip (OE 5 + {o; 
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° ‘ ° . (1) (2) ° . e a 
distributed independently of s/4; and s'4,. If 30,4; is true, we have oj4) = 
(2) 


°° ¥(1) (2 . a. . . . 
Citi = Oi4;, Say. In that case, if S;° and S;" are fixed, 6; is distributed in an 
?-variate normal form with expected value 6; and dispersion matrix C;.o;4, where 
’ y(1) 1 ¥(2) 1 
(41) Ci = (Sy + (SPY. 
Also, 6; is distributed independently of uw and uw. where 


(42) uj = (8i41)°/oi4 (j = 1,2) 


and u; is distributed as a chi-square with (n; — 7) degrees of freedom. Conse- 
quently, writing 


(2) 


(43) Si4n = (8541) + (si¢1)” 


we find that if 5C;,, , is true and S,” are fixed (j = 1, 2) the statistics 


4 s \"r-1/4 18 
(44) (6; — 6,)'C, (6; — 6:)/si 41 
and 
n i (sD 
tae 2— 0 (8i+1 
(45) = *) 
nm — 1 Si41 
are distributed independently as variance-ratios, (44) with 7 and (nm, + ny — 27) 
degrees of freedom, and (45) with (nm; — 7) and (nm. — 7) degrees of freedom. 
Therefore, given pre-assigned positive quantities e; the probability P’ that 
(46) (6; — 6,)'C7'(6, — 6)/sinn Sei, i= 1,2,-°-,p—1, 


should hold simultaneously is equal to 


p-l 

(47) P =|] P;, 
i=] 

where 
(48) P; F'n, +no-2s(€i) 
provided 5, is true fori = 2,5, --- , p. From (45), we get the following simul- 
taneous confidence-bounds (49) on the 7;-norms of 6; where 7; is a subset of 
(1, 2, --- , 7) (under the highly restrictive condition that 3€,, is true) for i = 2, 
3 +++ yp: 


(49) T (5) — €:8141Amax(C (7,9) s T,é) s T {6\) + €:8:41Amax(Cic7,)) 


with probability not less than P’, where Ci:7,) is the 7 ;-submatrix of C; . 
To test the hypothesis 3p) , the step-down procedure suggested is: 


accept 5p if 


(6, ~— 8)'C.'6; — 5;)/8i41 < e; ; a = i, Zoos — i, 
5 Ae CD) 
(50) rere st bat i=1,2,---,p, 
um s) 


and, otherwise, reject Ho , 
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where e; , c; , d;(c; < d,) are pre-assigned positive constants. The level of sig- 
nificance a@ is given by 


P p—l 
(51) «= 1~ {IT} {iT Pi, 

tel t= ) 
where P; is given by (40) and P; by (48). For a pre-assigned value of a, the 
constants c,;, d;, ef are uniquely determined if we stipulate that 


P= P,=---=P,=P,\=P,=---=P,,= 6, say, 


and that (¢;, d;) gives an unbiassed partitioning of the 100(1 — 8) per cent 
critical region of the variance-ratio distribution with z:and n; + nz — 27 degrees 
of freedom. With this choice the step-down test is locally unbiassed. 
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THE LIMITING DISTRIBUTION OF THE SERIAL CORRELATION 
COEFFICIENT IN THE EXPLOSIVE CASE 


By Joun S. Wurre 


Aero Division, Minneapolis Honeywell Regulator Company, Minneapolis 
Minnesota 


1. Introduction and summary. Several authors have studied the discrete 
stochastic process (x,) in which the ’s are related by the stochastic difference 
equation 


(1.1) Le = ayy + Ue, gui. Z--- F. 


9 =» 
where the w’s are unobservable disturbances, independent and identically dis 
tributed with mean zero and variance o’, and @ is an unknown parameter. 
The statistical problem is to find some appropriate function of the x’s as an 
estimator for a and examine its properties. 
We may rewrite (1.1) as 


€ t—1 t 
(1.2) Ui=Utatijite-- +a wtadr. 


From (1.2) we see that the distribution of the successive z’s is not uniquely 
determined by that of the w’s alone. The distribution of x» must also be specified. 
Three distributions which have been proposed for 2 are the following: 

(A) x = a constant (with probability one), 

(B) zo is normally distributed with mean zero and variance o°/(1 — a’), 

(C) % = Tr. 
Distribution (B) is perhaps the most appealing from a physical point of view, 
since if x has this distribution and if the w’s are normally distributed, then the 
process is stationary (e.g., see Koopmans [4]). However, there are several analytic 
difficulties which arise in the statistical treatment of this process. Distribution 
(C), the so-called circular distribution, has been proposed as an approximation 
to (B) and is much easier to analyze (e.g., see Dixon [2]). Distribution (A) has 
been studied extensively by Mann and Wald [5]. An interesting feature of 
distribution (A) is that a may assume any finite value, while for distributions 
(B) and (C) a must be between —1 and 1. From (1.2) we see that a process 
satisfying (1.1) and (A) has 


(1.3) var (z,.) =o(l+a°+--- + h. 


If |a| = 1, lim,.. var (x,) = © and the process is said to be “explosive.” 
Mann and Wald [5] considered only the case | a| < 1. They showed that the 
least squares estimator for a is the serial correlation coefficient’ 


a Up Lt-1 
(1.4) a= 2d ao 
yas 
Received December 30, 1955; revised May 27, 1958. 
1 In this paper, the summation sign 7. will always refer tosummationfrom t = l tot = T. 
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and that (for ja) < 1) this estimator is asymptotically normally distributed with 
mean a@ and variance (1 — a’)/T. Rubin [6] showed that the estimator @ is 
consistent (i.e., plim &@ = a) for all a. 

In this paper the asymptotic distribution of & will be studied under the as- 
sumption that the w’sare normally distributed. For | a | > 1, it isshown that the 
asymptotic distribution of a is the Cauchy distribution. For | a@| = 1, a moment 
generating function is found, the inversion of which will yield the asymptotic 
distribution. 


2. The distribution of 4 — a. From equation (1.1) and condition (A) the 
joint distribution of 


x’ = (x, ~ene *** Xr) 
is easily found to be 
(2.1) f(x’) - exp [(— 1/20") 7 (x, — az;-1)'] 
> ke (29a)? ‘ 


The maximum likelihood estimator for a is then the least-squares estimator 4. 
Since we shall be considering only the distribution of 


a Zz. Ue Tey 

De ti 
we may, without loss of generality, take «° = 1. For the time being we shall 
also set ro = O. 


We may now write (2.1) in matrix form as follows: 


oe vn _. exp (—}2’Px) 
(2.2) f(a ) = (2x)? 2 ’ 
where P is the 7’ & T matrix 
1+ a’ —a 0 0 
—a 1+ a —a 0 
0 —a 1+a* —-a 


(2.3) P= 


-—a l+ta —-a 
0 —a 1 


Since &@ is a consistent estimator for a, we shall consider the distribution of & — a 
rather than that of a alone. We have 
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where A and B are the 7’ X T matrices 


2a —1 0 
—l 2a —] 
_1 0 —1l 2a | 
| —1 2a 0 | 
(2.5) “SS 
1 00 | 
010 
B=|° 0 1 . 


0 i ®@ 
0 0 0 
Let m(u, v) be the joint moment generating function of x’ Ax and x’ Br. We 
have 


m(u,v) = E (exp {2’Aru + 2’ Brv}) 
26) = (2x) 7" | exp (2’Azu + x’ Bav — x’ Px/2) dx 
2.0 
= (2r) sa exp (—2’Dx/2) dz, 


where D is the 7 & 7 matrix 


pq 0 
q Pq 
) 
(2.7) San «tin tial” * 9 
Yq Pp q 
hi 0 q l 
p=1+a@ — 2v+ 2au, q = —(a + u). 


By a well-known integration formula (Cramer [1], Eq. (11.12.2.), p. 120) we 
have 


(2.8) m(u, v) = (2x) 7” | exp (- = o*) dx (det D) i 


») 


~ 


If we now write det D = D(T), we note that expanding (2.7) by the elements of 
the first column gives the difference equation 


(2.9) D(T) = pD(T — 1) — ¢D(T — 2). 


From the initial values D(1) = 1 and D(2) = p — q, we obtain 


(2.10) an a oP a tl 


= § oS =—-F 
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where r and s are roots of the equation z” — pr + q° = 0, that is 
2.11) r,s=(p+ Vp? — 4q°)/2. 


The inversion of m(u, v) = D(T) + seems out of the question for finite T. The 
inversion of a certain limiting form of m(u, v) will be discussed in Section 4. 


3. The standardizing function g(7'). Since & is consistent the limiting distri- 
bution of & — @ is the unitary distribution. The first problem then is to find some 
function of 7’, say g(7'), such that the limiting distribution of g(T) (& — a) is 
non-degenerate. We note that the results of Mann and Wald (Eq. (1.4) above) 
give g(T) = (T/{1 — a’})! for |a| < 1, since (T/{1 — a’})! (& — @) has a 
limiting normal distribution. The function g’(7’) corresponds roughly to the 
reciprocal of the asymptotic variance of (@ — a), or in Fisher’s terminology the 
“information” on @ supplied by the sample. 

The “information” on a may be obtained explicitly as follows. Let f be the 
density function (2.1) with z = 0 and o° = 1. The “information,” say I(a), is 


then defined as 





da? 
= E (> zi.) 
(3.1) 1 Ss =) 
—_ —- - -_ -- i ] | 
a ( —— if ja| #1 
Seguro ) f ja|=1 


If the z’s had been independent random variables, then J(a) (& — a) would be 
asymptotically N (0, 1) (Cramer [1], Eq.(33.3.4), p. 503). This, of course, is 
not the case. This approach does, however, give an heuristic method for finding a 
function g(7') such that g(7’) (@ — a) has a non-degenerate limiting distribution. 

We might now take g(7’) = [I(a)]'; however, it will simplify the computations 
to use slight modifications which are asymptotically equivalent to [J(a)]*. We 
choc se 


g(T) = VV = “3 for |a| < 1, 
(3.2) = ro for|a| = 1 
V2 
la’ 
a — | 


In the next section it will be shown that g(7) (@ — a@) has a non-degenerate 
distribution for all values of a. 
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4. The limiting distribution of g(7’) (4 — a). We shall first consider the joint 
distribution of x’Ax/g(T) and 2’Bx/g(T). Let M(U, V) be the joint moment 
generating function of these two statistics. We then have 


M(U,V) = Elexp 2’AxU/g(T) + 2'BxV/9°(T)| 
(4.1) 


= m[U/g(T), V/g'(T)|, 


where m(u, v) is the joint moment generating function (2.6). 
From (2.10) and (2.11) with g = g(T), u = U/g and v = V/g’, we have 


M(U,V) = D(T)"' 
(4.2) l-—~s 


— 
»¢ + e. 


r—s 


T l-—r 
Ss 


r,s = 431 + a + 2aU/g — 2V/g? + {(1 — a’)* — 4a(l — a)U/g 
(4.3) 


(1 — a )U*/g? — ACL + 0°) V/g? — 8aUV/g* + 4V"/g*}"”) 


Kor sufficiently large 7 and | a | # 1, we may factor (1 — a°) out of the radical in 
(4.3) and expand the remaining radical by the binomial theorem. We then have, 
up to terms of order O(g °) 


Li = si + a’ + 2aU/g — 2V/g° 
(4.4) - i 
; 2(1 + a’) I 2U° 7 
+ < l-—a - 2al = ( t “7 rae — + O(g "ee? é 
(1 — a)¢’ (1 — ao )¢ ? 
Taking r with the plus sign and s with the minus sign we have 
U* + 2V - 
b oe Sn } Ol, 
‘-—" 
(4.5) vy 2 
s=at 2aU/g + 6 - + Og “2. 


(1 — a)¢’ 


Substituting the appropriate values of g(7') from (3.2), we have 


2 OV . 

px] — ( > 2 + O(T *) for|a| < l, 
(4.6) . ‘ 

9 l -— a . U 4 2a V nyt 
s= a 2c — Ol p. 
+ 2a \ TP U + r = 1 
r2 OV 2 

r 1+ = om ) + O(| a| —. for jai > i, 

(4.7) 


ia ee 8 r2 9.2 2 
ah 2a (a 1) - (U~ + 2a hn 1) + Olle ar) 


a ae 
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If ja |, the expansion in (4.4) is not valid; however, from (4.3), we have 
QaU , 2in/V a 
vm 4 +¥3 AVE + OF *)  for|al| = 1, 
(4.8) “ P 
Sal] 9; r , 
s=1+ — - av + O(T”). 


Substituting these results in (4.2), we have 


lim M(U, V) = exp (V + U’/2) for|a| < 1, 


=(1-— U’—2V)”  for|a| > 1, 
(4.9) : ~ 1 wT a7 
exp (1/2aU) (cos 2/V — = = sin 2vV) 
av V 
forl|a| = 1 


The next problem is to obtain the limiting distribution of g(7)(@ — a@) from 
lim M(U, V). Since g(T)(@ — a) = g(T)x'Ax/zx'bX, the problem is one of 
finding the distribution of the ratio of two random variables. One method of 
solution has been proposed by Gurland [3]. Let X and Y be two random variables, 
Prob (Y > 0) = 1. We wish to determine the distribution of Z = X/Y. Let 
W = W, X — 2Y. Then we have 


Prob (Z < z) = Prob (X/Y < z) 
(4.10) Prob (Y — z¥ <0) 
= Prob (W, < 0). 


If the distribution of W can be found, the distribution of Z will immediately 
follow. Frequently the distribution of W can be found from that of X and Y by 
means of moment generating functions. Let 


(4.11) m(w) = E(exp{Ww}), m*(u,v) = E(exp|Xu + Yr}), 
then 
m(w) = E(exp{X — z¥}w) = E(exp{Xw — Yzw}) = m*(w, — zw) 


To apply this technique to the problem at hand, we set W v’Ax/g 
zx’ Bx/g’. From (4.1), (4.2) and (4.9) we have 


m(w) = M(w, —zw), 


lim m(w) = exp (—zw + w’/2) for|a| <1, 
Hy anh , 
(4.12) = (1 + 22w — w’)”’ for|a| > 1, 
( . a ) -12 
re _—— 2aw -—_— \} 
= <exp (+/ 2aw) (cos 2V —zw — V sin 2y =z) > 
int ean 
; 2V/ —2u 


for|a| = 1 
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The inversion of lim m(w) is trivial for|a| < 1. The moment generating function 
exp (—zw+w'/2) is immediately recognized as that of a random variable 
which is normally distributed with mean —z and variance 1. Hence we have 


0 


lim Prob (Wo < 0) (27) e exp (—{t + 2}*/2) dt 


(4.13) (29) ns exp (—0'/2) dt 


lim Prob {g(T)(@ — a) < 2}, 


i.e., g(T') (@ — @) is asymptotically normal with mean 0 and variance 1. 

For |a@| > 1, the inverse of lim m(w) might be obtained directly in terms of 
Bessel functions; however, it is more appealing from a statistical point of view to 
proceed as follows. Let XY und Y be independent chi-squared variables with one 


~ . ’ , ’ (wv l » 1/2 - . 
degree of freedom. Then E(exp|Xw}) = E(exp|Yw}) (1 — 2w) is their 
common moment generating function. Now set R aX — bY, the moment 


generating function of R will be 
mr(w) E(exp}| Rw} ) K(exp{aX — bY }w) 
(4.14) 1 
= ({1 — 2aw}{1 + 2bw}) 
In particular if we set 
(4.15) 2a = YZ1 +2 —-2z, 2%»=Sf7it+2+4+z, 
we have 
(4.16) mMp(w) = (1 + 2zw —- wy? lim m(w). 


Hence, the limiting distribution of W, for |a@| > 1, is thesameas the distribution 
of R = aX — bY. We then have 


lim Prob (W < 0) = Prob (aX — bY < 0) 

= Prob (X < bY/a) 
(4.17) . 
| 


x exp (—2z/2 — y/2) - 
= dx dy 
/0 V xy 


= lim Prob {g(T)(@ — a) < z} = say F(z). 


2r 


The density function corresponding to F(z) is 


Le awe oe ( } 
f(z) = dF (2) — J V a/b exp (—by/2a — y/2)< ab a) dy 
5 dz 2m Jo i | me 7° 
1 y 2 d(b/a) 
(4.18) =—Vea/l 
Qn _—— 1+ (b/a) dz 
1 1 


‘ by (4.15)). 
wl+ 2 (by ( »)) 
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Hence the limiting distribution of g(7)(@ — a), for |a@| > 1, is the Cauchy 
distribution. 

We have been unable to invert lim m(w) when |a| = 1. In the next section 
certain results concerning this limit and more general problems of this type will 
be discussed. 

If we now let 2» = c, a non-zero constant, the analysis proceeds much as before. 
Let A, B, P, and D be the T X T matrices defined in (2.3), (2.5) and (2.7). We 
then have, analogous to (2.1) and (2.4), 


f(x’) 


(24)~*” exp (cr1a — a’c’/2 — 2'Px/2), 


2 
z’Ax + cx, — ac 
za’ Br+ec 


(4.19) 


a=~-qaq” 


rw ie ° . 22 2: 
rhe joint moment generating function of x’Az + cz, — ac and x’Br + ¢ is 


m(u, v) = E (exp {(x’Ax + cx — ac’)u + (2/Bx + c’)v}) 
{exp (cr _— Cau _ =) (2x)7” 


(4.20) . 
; [ exp (1 + alex; — “ve) dx 
= exp (cv ~ o au — 7) exp{(u + 0) POEM ney 
lim m(U/g, V/g’) = lim M(U, V) 
(4.21) 


Il 


| ent | mae ean 
ore exp ( [ pt) Sf” 


where D(T) is as defined in (4.2) while D’(T’ — 1) is defined in a similar fashion 
but with g = g(T). 
For | a} S 1, it follows from (4.6) and (4.8) that, since g(T) and g(T — 1) are 
of the same order, 
lim D(T) = lim D’(T — 1) 


and hence 
(4.22) lim m(U/g, V/g°) = lim M(U, V) = lim DT) *. 


We see that this limit is the same as that for r» = 0 as given in (4.9) and hence 
the limiting distribution of g(7')(@ — a@) does not depend on the initial value 
tofor|a| <1. 


For |a| > 1 we have, from (4.7), 
lim D(T) = 1 — (U + 2V), 
_ (U + 2V). 


? 


a 


(4.23) 
lim D’(T — 1) 
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and in place of (4.22) we have 


22 ( 
= gn~tlt _ ac _ D(T—1) ) 
= lim D(T) exp ( 9 E D(T) | 


( 2 a 2 72 or \ 
os oe 2Vv)""? exp @ L)¢ ( l + 2V }. 
\ 2 1— U?— 2V/} 


- 


_ 
‘ 


lim M(U, 
(4.24) 


This moment generating function may be inverted by the methods of Section 4 
to give 


5 2 Ee “ q r 1 c(a’ — 1) 
(4.2% f(x) = -—> = : , = ’ 
ee Ke /a(l + 2) & (, + 3) rk + 3) . 2 


as the limiting distribution of g(7)(@ — a). We note that for ¢ = 0, f(x) is the 
Cauchy distribution as obtained in (4.18). 


6. Final remarks. The results of Mann and Wald [5] show that the limiting 
distribution of g(7')(& — a), for|a| < 1, isalso N(O, 1) if, rather than assuming 
that the “errors” u,; are normally distributed, we merely assume that all of the 
moments of the w’s are finite. This is another example of an invariance principle 
which seems to hold quite generally for the limiting distributions of function of 
random variables. Roughly speaking, there seems to be an unproved (and un 
stated) theorem that the limiting distribution of a function of a sequence of 
independent random variables, with suitable restrictions on these random vari- 
ables, depends only on the form of the function and is the same as the distribution 
of a related functional on a stochastic process. 

A general result of this form is Donsker’s Theorem [7] which gives the limiting 
distribution of any function of sums of independent identically distributed 
random variables with finite variances as the distribution of a corresponding 
functional on the Wiener process. It is conjectured that this type of reasoning 
will show that the results of Mann and Wald will still hold if the u’s are merely 
assumed to have finite variances. 

For a = 1, application of Donsker’s Theorem shows that the limiting distribu 
tion of g(T)(& — a) is the same as the distribution of the functional 

1 
; I x(t) dx(t) 11) — 2 
G[x(-)] Sa . = a - 
x(t) dt [ z(t) dt 
“0 “0 
on the Wiener process, independent of the distribution of the w’s. This distribu 
tion will be considered in a future paper. 
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A LIMIT THEOREM FOR THE PERIODOGRAM 
By SALOMON BocuNer! AND Tatsuo Kawata? 


1. Introduction. Let &(¢) be a real stationary process in the wide sense with 
mean 0 and let its covariance function and spectral function be p(u), F(a) re- 
spectively. We assume that F(x) is absolutely continuous and has a spectral 
density function p(x). The second-named author, [1], has discussed the periodo- 
gram 


> 


7 sia 1 | F. > : ugt 
(1.1) I(T) = int | | 80 at} , 


in case &(t) is stationary even of the fourth order, so that the expectation 
E&(t)E(t + w&(t + v)&(t + w) P(u, v, w) 


exists and is a function of u, v, w alone. It was also assumed that the function 
Q(u, v, w), which is the difference between P’(u, v, w) and the corresponding 
fourth moment of a stationary Gaussian process, is the Fourier transform of a 
function and that the latter function satisfies the Lipschitz condition. Under 
these assumptions it has proven that (1.1) does not converge in mean to any 
random variable as T — , but that the covariance function of J(7') and J(T’) 
does tend to a limit whenever T and 7” both tend to infinity in a certain related 
manner, and the limiting value of the covariance function was determined. 

The paper involved a rather troublesome manipulation of a Fourier integral, 
but we have found since that under somewhat different assumptions the compli- 
cations can be reduced appreciably. In a separate publication, [2], a certain 
integral transformation was investigated on its own merit, and in the present 
paper an application of the somewhat modified approach will be made to the 
problem of the periodogram. The expression (1.1) will be replaced by «a more 
general one, and as regards the difference function Q(u, v, w) the assumptions will 
be modified as follows. We add expressly the requirement that Q(u, v, w) shall be 
integrable in £;, but the requirement that its Fourier transform shall satisfy 
the Lipschitz condition is being omitted entirely. 


2. The Theorem. We shall consider the random variable 


(2.1) S(T) = Ms / e(t)M (3) ett! ay 
| vo 





in place of (1.1). We shall call (2.1) a generalized periodogram of &(t). 
Let us assume that 


2.2) P(s8; , 82 , 83) ((s1 , 82, 83) + Pa(si, 82, 83), 
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where 


(2.3) Pe(si, 8, 8s) p(s) p(se — 83) + p(Se)p(s3 — 81) + p(s3)p(s1 — 82). 
If &(t) is a stationary Gaussian process, then Q(s; , 82 , 8;) = 0. This assumption 
was set up first by Magness [3]; see also Parzen [4]. 

We assume further that 


(2.4) Q(s; » 82, 83) S L,(E3), 
and that the Fourier transform of Q(s; , s2 , 83) is also in L,(43), so that 
(2.5) q(x » Mes r3) = / eQ(s, » 82, 83) dv, ’ 
B; 
2.6) Q(s: , 82, 83) = (2x) | e **' (x, , 22, 2a) Ave , 
By 


where E, denotes the whole Euclidean space of k dimension and (s - x) = 
$1X1 + Sole + 83%3. 

Under these conditions, we obtain the following theorem. 

Turorem. Let M(a) be bounded and integrable in (— ~, ©) and let the Fourier 
transform K(x) of M(a) 


K(x) = [ eMC) da 
satisfy 
(2.7) K(x) = O(|x|"), asa &. 
Then we have, as T,; and T, tend to infinity such that T;/T2— uw, wu ¥ 0, 
[(2x)* (ICLP + |C,?\")p"(O), if E = 0, 


9 


(2.8) lim cov { S(T), S(T2)} = 4 ee 
\(2r)" |C," \"p'(&), ift #0, 


and 
2(2x)° (|cy?P + icrr ed ar 
(2.9) lim E{S(T,) — S(T2)}*? =4—|CY))p*0), if = 0, 
2(2x)* (|CP? — |C yp"), _—sif (&) £0, 


provided that p(x) is continuous at §, and the constants Cc (j = 1, 2) are given by 


CY = 4! | M(a)M(ua) da, 


eS = “| M(a)M (ua) da. 


We add a remark. If p> ~, or » > 0, then Cy”, C,” converge to 0. This is 
easily seen from the fact that oe = Ci : "i = Cyj,, and a ¢ uw *MS~.. 
\M(a)| da — 0, (u — 0), M being an upper bound of M(a). 
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We also note that the theorem implies that the constant 


igyv(1) /2 y(2) | y(1) 12 | v(2) 2 
Cy + |C; | — |C, | — |C,| 


must be non-negative. This can also be established directly by verifying that 
it is the value of the double integral 


LL! 


A(a, B) = M(a)M(g) — uM (ua) M (up). 


A(a, B) |” + (A(a, 8))*] da dp, 





where 


For the proof of the theorem, we first of all state as a lemma, a theorem given 
in [2]. 

Lemma 1. Let M; (a) (j = 0,1, --- , k) be bounded and integrable over (— ©, ~ ) 
and let their Fourier transforms be 


aX 


K(x) = | e'*M,(a) da, (j = 0,1,---,h). 


Put 
K(x, ’ weg **? 5 ae > To 9 T a ee T;) 


k k 
= J] 7; - K(To(xi + x2 + «>: + )) TL KAT, 2) 
7=0 ?=1 
for any positive numbers Ty, T,, +--+ , Te. Then we have 


. | i ‘ a. ae a 
lim — / TG 5 Beg «> 5 Me Be 5 Me Te Ty, «+ Fe) ah 


T,-% OE, 


: C,(2n)‘f(0, -++ ©), (i = 0,1,---,h), 
if T; go to infinity such that T)/T;— uw; and py ¥ 0 (Gj l, 2, +--+, &) and 
f(ai, +++, xe) satisfies the conditions that the function f(a, , +++ , t) ts continuous 


and belongs to Ly.) and its Fourier transform 


g(a, a@2,°**,a%) = [ ee" f(a, +++, Le) dv, 
E 


“Fk 


likewise belongs to [y\(E,). Cy as 


CC. = i] M,(a) I] M (—u,; a) da. 
2 j=l 


3. A lemma. For the proof of the theorem, we need one more lemma. 
Lema 2. Let K(x), (j = 1, 2) be a bounded function which is the Fourier 
transform of a bounded and integrable function M ;(a) 


x 


(3.1) K(x) = | M,(a)e™ da, j = 1,2, 
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and let us assume that 
(3.2) K(x) =O(jx|"), astro, j=1,2 


(i) If p(x) ¢ L,(—~, ~) and continuous at —é, then 


(3.3) (T, T:)' [ K\(T\(a + &))Ko(T.(a + £)) p(x) dr 
converges to 

2r-C,:-p(—&), 
when T,;, T2 — ~% and T;/T, — wand p # O, where 


x 
i 


Rad M,(8)M.(us) dp. 


(ii) If p(w) e Ly(—«, ©), and p(x) continuous as —&, then 
(3.4) (T, T:)! | KilTe + &)IKAT Ax + &)]p(x) dz 


converges to zero when T; , T, — © such that T;/T, — w and wp # 0 and & # és. 
Proor. (i) We consider the integral 


(3.5) (T; 7; 7 | K\(T, x)KAT2 x) dx, 


which is absolutely convergent because K, , K2 are bounded and satisfy (3.2). By 
the Parseval theorem, since K ,(x) ¢ Lo(— «, ©), we have 


9 = 9 e — 
(Ty 72)'* [KT 2)K (Ts 2) de = op pa [Ma (*) M, (32) da 
oe 1 2 «x 1 2 


= 2n(T rT.) : | M,(8)M, (- = 3) dg. 


x 


This converges to 


oo 


QeC, = Quy!” | M,(8)M.(—us) ds, 


as is easily seen from the fact that 
2 


| | M,(a8) — Mz(a 8) | d8 — 0, 


if a > ay and ay ¥ 0. 
Hence it suffices to show that 


(3.6) I = (1, T:)"" | K\(T; x)K2(T2 x) {p(x — §) — p(—€)} dx 


converges to zero. 
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We divide J into two parts: 


o +f lenin fs, 


where 6 is taken so that | p(x — &) — p(—&)| < «, for |2z| < 6, e being any as- 
signed positive number. 
We have 


Ins (tT) [| Ku(Ts2)K.(Ts2) | de 
zi< 
(3.7) < dT./T)" / | K,(u)Ke = u)| du 
L_ 26 1 | 
< 


c | ~...e 
| + u> 


for some constant C, as follows from (3.2). 
Next we have 


lA 


bls (TT) | [Kits 2)Kx(T22)| |pe - @ ae 
+[p@ (Ti 7)! f [Ku 2)K Tex) | de 


C | p(x — &)| 
I, z 


C'| p(é) | dx 
= (Ty et 


(Ty T2)!? Jiz\>8 2?’ 








for some constant C. Hence we get 
(3.8) I, = o(1) 
as T,T, > ~, and this together with (3.7) proves (i). 


We shall now prove (ii). We have 


(Ti T:) , [- K\(Ti(x + £:))K2(T2(x + f)) dx 


f 2Qr E Qa == ia (§;—£4) 
3. . uw, (@) a, (=2) einer 
(3.9) Fe Te | n(*) (2) ¢ i 


_ 2x(T;/T:)' ‘| M,(s)M2 (— 8) eit 18 (Eka) dp 
0 1 


and the difference between this and the expression 


(3.10) 2n(T3/T:)'* [ M,(8)Mo(—pp)e'™™ = ag 


is in absolute value 


< 22(7T;/T.)'” [ | (8) | | Mz (- 2) — M.(—ws)| dp 


< c | M,. € * 3) = M.(—u8) dg. 
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But this is as small as we please, for 7, , 7: large and 7/72 near to yn, provided 
wp #0. 


Now (3.10) tends to zero by Riemann-Lebesgue lemma, and we conclude that 
(3.9) tends to zero also. 
It suffices, then, to show that 


(3.11) J = (7; T:)' P [. KilTi(x + &))KelT2(a + &)){ p(x) — p(—&)} dx 


converges to zero. 
We have 


J 


I] 


(T; T:)'” c K\(T, y)K2{ Tilly — (& — &))} {ply — &) — p(—&)} dy 
(3.12) : 


Il 


(7,7)? [ +0 Ty {  =Atd, 


vi< vi>s 
say. Here 6 is so chosen that 

(3.13) ip(y — &) — p(—fi) | < «, 
for |y| < dand 

(3.14) lii—&|—-5>c>0, 


for some positive constant c. Then 
|Ji] S(T; 7)” «| am |Ki(T1 y)KoAT2y — Toki — &))| dy 
lwl< 


i 
ivi<s J'2(|& — &| — y) 
(71/T:)'"C -c-8 S Ce, 


A 


7 a 1/2 


IIA 


for some constant C by (3.13) and (3.14). 
Next we shall consider J,. We divide J; further into two parts, 


Jo = (T; ,)* | 


l 


|u| >8,jy—(E1—f2) 


+ (1,7; [ 
- | 


y|>8,\u—(E;—f2)| <9 


=JIu+ Jn, 


say, where 0 <n < 4|& — &]. Then 


| Jor | 3 (T; rc | Z. 
|ul|>d,|u—(E1—b2) | >a Tiy 
a 1 | eee 
(3.16) ‘Thilyr Gob (| p(y — &)| + | p(—&) |) dy 
i, ft | p(y — &)| + | p(—&) | dy, 
(7, T2)"76n Jiyj>8.\v-i-tpi>n Ly — (bn — Ee) | 
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which converges to zero as T; , T, — &, since the integral is finite. Moreover 


\Joo| S(T: 7.)'"c | 


(&:—Ea) — cy < (1-2) +9 


1 
: (| p(y — &) | + | p(—&) |) dy 


: r. 
(3.17) - 
—— ar” (Ei Eg) +0 | 
s (T./T,)" — ¢ / (| p(y — &)| + | p(—&1) |) dy 
1 — & (£1:—&2)— 
<C | (| ply — &)| + | p(—é:) |) dy. 


Hence limsupr, .7,+0,7,/7,+» Of (3.17) is small for » small, that is 


(3.18) lim Joe = 0. 


From (3.16), (3.18) we obtain 
lim J, = 0, 
Q. 


which together with (3.15) gives lim J 


4. Proof of the theorem. We now proceed to prove the theorem stated in 
Section 2. 
We start with the computation of 


. , ] ae t ike f E f 2 
aS T S ] 2 = 4 & t M = ; lt = / & t)M ( ). - 
ES(T,)S(T»2) T, T,! | : (t) (7) « ( : ( r dt 

1 7 -1E (t to+t ts) 
= - B | Ste e@ictje or 
TT: be (t) E(t) E(t) Ets)e 
re) Cas) (ae) 7 Cys) 
M M M j ; 
(p 7 7» : T: dv, 
1 E(t te+t t 
- Ti T2 I. PGa~-th,h-th,4—-te~'” ‘ 
) . (+) ey (1) 
-M M -| M M : 
@ T; T: Te dt 
; | Ob —-—tryo—-—t,&y- tie EM eg tta-t4) 
2B, 
4 7 6 a) a (+) 
-M -) M M M lv, 
(i T, T> T> ' 
+ rg | Pelle — hye — by — Ker eertatito 
BE, 


TT» 
ti - 4 ( ts ) - ( ls ) 
- A ae -)MI{- j v; 
’ (7) ’ (7: . T. : T> d 


= S(T), T2) + S(T, T>). 
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Inserting (2.6) in S,(7; , 72), we have 


- 2 r | i) (7) (7:) (7) 
Ss ° =3 (2 j : 1 A — ] a 
ee ee ae I, (F; Na) Ma) ™ \e, 


e 1£ (ty —tattg—t,) dv, 
. i q(x1, L2, Xs) exp [t(t, — thay + ills — tae + i(ty — t)2xs] de, 
By 


(27) 3 rt I. q(x, r2, r3) dv, 


(4.1) I. M a) M (++) M (++) M 3) 
-exp[t{ —hi(ai + 2+ 23+ &) + (ai + 8) + be(ae — &) + (zs + &)}] de, 
= (25) 7:7: [ q(x1, t2, %3)K[—Tilx: + 22 + 23 + 8) 
-R(—Ty(a. + &)-K[T2(22 — )|K(—T2(xs + £)] de, 
(297) “7, T2 [ qa: — &, %2 + £, 23 — &) 
-K(—Ti(a1 + x2 + 23)]-K(—1,2)K(T222)K(—T223) dv,z, 


where we denote 


x 


(4.2) K(x) - | M(a)e™ da. 


Since M(x) and q(x, , «2, «3) satisfy the condition of Lemma 1, we obtain that 
(4.1) multiplied by 7. is convergent when 7,/T. — u(u ~ 0). Hence (4.1) 
converges to zero. 
Next we shall consider S.(7'; , T:). Inserting (2.3), we obtain 
| 


SAT), T2) — > 7 [ i p( ts -_ ty) p(ts — i) + p(ts = ti) p(ts io 
TT: JK, 


a hi le ts o ts 
(4.4) + plts — tpl: — t)}M (t+) M (*) M (*) M 3) 


ene tatta-te) gy, 
UT), T2:) + UAT), T2) + UT1, T2), 
say, where 


a. 
T; qs 


a ; S) (+) - (+) “if (ty—ty+ty—t) 
- J) -} A —} } -IJ == @ be 
: (7 ' (7 Nr) a " 


/ p(te * th) p(ts — ts) 
Ey 
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and U., U; are similar terms. By the assumptions of the theorem, we have p(u) = 
oe 


/ e'“*p(x) dx, and, if we insert this into (4.5), we obtain 
— 00 


UX(T1, T2) = ne | | p(x)p(y) dx dy 
T 7’. =: 00 


ity —it; (z+) ay to ito (x+€) ts its (u—-) wy ts —its(y-é 
-f aw. 4 it et ( ) its a ( ) ord (. ) 6-8 gy, 
I. (FF. . T,) ‘ T.) ‘ T,) ‘ ; 


- TT. / p(x)K[—Tilz + @IR[—Ti(x + 8)] de 


p(y) KITXy — \IKIT(y — 8) dy. 


Since L(t) is real, p is real too, and p(x) is an even function, and hence by Lemma 


> we vce 
2, we get 


lim U,(T1, T:) = (2x)°Cj p(é)p(—2) 


(4.6) ry.T a2 tigi 
= (2rC,)’p'(é), 
where 
(4.7) CG, | M1(8)M(8) dg =| M(g) |? ae. 


Quite similarly 
U(T:, T:) = T; T: | p(x)K[—Ti(x + QIKITAx — 8] de 


: | p(y) KIT iy — t)|K[—T.(y + £)] dy. 


If ¢ = 0, then, by (3.3), 


(4.8) UAT: , T2) — (2m)? \Cy” |’p*(0), (T,,7T2— ~, 7,/T:— pw), 
where 

(4.9) aa [.M@mus) dg. 

If ¢ ¥ 0, then (3.4) shows : 

(4.10) U(T,,7T2)—-0, (T,,T2:— ©, T1/T: — up). 


Finally we have 
UT:, T:) = TrTs p(x)K(—T(« + \IK[—Telx + 8] de 


| p(y) KI-Tily + HIKI—Ta(y + 8)] dy, 
and 


(4.11) U(T; , T2) — (28) |Cy Pp*(é), for every &, 
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where 
(4.12) Cc? = ui f M(8)M (uf) dp. 


Inserting (4.7) (4.8) (4.10) and (4.11) into (4.4), we get: If — ~ 0 
(4.13) Sx(Ty , T') — (2e)’ (Ci + |Cr” yp") 
as T,, T2 ~ ~, T;/T2 — p(0), and if § = 0 
(4.14) So(T1 , T2) > (29)(Ci + |CPP + \c/). 


Hence we g 


~ 


t 


(4.15)  ES(T,)S(T2) > (24)°(Ci + |CY|’)p*(é), ifé ~0,u <0, 
(4.16) E(T;)S(T2) > (2x9 (Ci + |CL? + |CP Pp), if = O,n ¥ O. 


We also have 


DOL — a — h\ a7 ( &\ tt) 
ES(T) = T Je, EL(t)£()M (*) M (*:) € dv, 


= r | p(x) | K{T(« — &)}|* dz, 


and by Lemma 2 this converges to 2rCyp(¢). Thus we find that 
cov {S(7;), S(T2)} = ES(T,)S(T2) — ES(T,)-ES(T?) 
converges to 
(29) Co p(s), ift ~ 0, 
and to 
(2%)*(\C0? |? + |CY? pO), = if = 0, 
when 7’; , 7’: increase indefinitely such as 7,/T2 — u(u # 0). 
Especially var S(7') converges to 
(24)"|Cy’p*(é), if E = 0, 
and to 
((2r)°(J0? + |c2 )p0), if = 0. 
Also, we easily find that 
E \S(T;) — S(T:)/’ 
converges to 
2(2r)*(\CP? — |CP yp), iff ¥0 
and to 


2(2e) (ICY? + [CPP — [CPP — cP P)p*0), ift = 0. 


Hence the theorem is proved. 











1208 SALOMON BOCHNER AND TATSUO KAWATA 


REFERENCES 

{1} S. BocuNerR anp T. Kawata, ‘Special integral transformation in Euclidean space,’’ 
to be published. 

{2] T. Kawara, ‘‘Some convergence theorems for the stationary stochastic process,’’ to 
be published. 

{3] T. A. Maangss, “‘Special response of quadratic device to non-Gaussian noise,’”’ J. Appl. 
Phys., Vol. 25 (1954). 

[4] E. Parzen, ‘On consistent estimates of the spectrum of a stationary process,’’ Ann 
Math. Stat., Vol. 28 (1957). 








PROOF OF SHANNON’S TRANSMISSION THEOREM FOR FINITE-STATE 
INDECOMPOSABLE CHANNELS’ 


By Davip BLACKWELL, Leo BREIMAN, AND A. J. THOMASIAN 
University of California, Berkeley 


1. Summary. For finite-state indecomposable channels, Shannon’s basic 
theorem, that transmission is possible at any rate less than channel capacity but 
not at any greater rate, is proved. A necessary and sufficient condition for in- 
decomposability, from which it follows that every channel with finite memory is 
indecomposable, is given. An important tool is a modification, for some processes 
which are not quite stationary, of theorems of McMillan and Breiman on proba- 
bilities of long sequences in ergodic processes. 


2. Notation, definitions. For any positive integer N, we denote by /(N) the 
set of integers 1, 2,--- , N and for any set S we denote by S” the set of N- 
tuples (s;, --- , sv) with s;e¢ S,ie I(N). 

Let A be a fixed positive integer. A source is a pair (M, $), where M is a finite, 
say D X D, indecomposable Markov matrix and ¢ is a function from J(D) to 
I(A). A channel is a sequence of A Markov matrices C(1),--- , C(A) of the 
same size, say R X R, and a function y from J(R) to I1(B), where B is some 
positive integer. 

The elements of 7(D) and J(R) will be considered as states of the source and 
channel respectively. The source will be considered as driving the channel as 
follows. If de J(D), re I(R) are the states of the source and channel at the 
beginning of a cycle, the source moves from d to a state e ¢ I(D), selected ac- 
cording to the Markov transition matrix M, so that M(d, e) is the probability 
that the new state is e, given that the initial state is d. The source then emits the 
number ¢(e) ¢ J(A), which is fed into the channel. The channel then moves into 
a state s ¢ J(2), selected according to the matrix C(¢(e)), and emits the number 
¥(s), completing the cycle. A new cycle then begins, with e, s as the initial states 
of the source and channel. The joint motion of the source and channel is thus 
described by the source-channel matrix, which is a DR KX DR Markov matrix L, 
with elements L((d, r), (e, s)) = M(d, e)C(¢(e), r, s). A channel will be called 
indecomposable if for every source the source-channel matrix L is indecomposable. 
Thus, for any source and any indecomposable channel, there is a sequence of 
random variables {(d, , Tn), —© <n < ©}, which isan ergodic Markov proc- 
ess with transition matrix L. Moreover, the joint distribution of {(d,, ra)} 
depends only on L. McMillan [4], extending the work of Shannon [5], has shown 
that associated with any stationary ergodic process {z,} with a finite set F of 
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states, is a number A, called the entropy of the process, such that for large N it 

is practically certain that the sequence of states of length N which occurs is one 
one ° a—Nh ° ’ 

whose probability is about 2~“"; more precisely, for any sequence f<¢ F let 


Qy(f) = Prob { (21 pees 5 zy) = f}. 


Then 
(1) N™ log Qx(i1, ++, 2v) 2 —hin L,as N > ~, 


where the log above and throughout this paper has base 2. Breiman [1] has 
shown that convergence with probability 1 also occurs in (1). For the ergodic 
process {(d, , r,)}, the processes {z, = O(dk)}, {yx = Wre)}, {(ze, ye)} are of 
course also ergodic; we denote their entropies by H(X), H(Y), H(X, Y) re- 
spectively. 

For a fixed indecomposable channel, the upper bound H over all sources of the 
number H(X) + H(Y) — H(X, Y) iv ealled, following Shannon, the capacity 
of the channel. Shannon [5] and, subsequently, McMillan [4], Feinstein [2], 
Hincin [3], and Wolfowitz [6] have shown that, under various hypotheses on the 
channel, it is possible to transmit over the channel at any rate less than its 
capacity, but not at any rate greater than its capacity. For a channel as defined 
above, this means, as in [6], the following. For a given channel, to say that it is 
possible to transmit at rate G means that for every « > 0 there is an No such that 
for any N => Np there are 2°° = J distinct sequences uj , +--+ , uy, Where each 
u; ¢ 1(A)™, and J disjoint subsets FE, , --- , Ey of 1(B)* such that 


(2) Q(r, u;, E;) > 1 — e for all j and all re [(R), 
where for any r ¢ I(R), u = (u(1), --- , w(N)) e (A)™, BE c 1(B)™ 
Q(r, u, EF) = pe C(u(1), 7, 1) --- C(u(N), rv-i, Tw), 


where the sum is over those sequences (7; , --- , rw) for which 


(¥(r1), --- , W(rw)) € B. 


Chus Q(r, u, £) is the probability that the output sequence from the channel is an 
element of Z, when the channel is initially in state r and u is the input sequence. 

For a given channel, denote by H* the upper bound of the rates G at which it 
is possible to transmit. We shall show that, for indecomposable channels of the 
type considered here, H* = H, that is, it is possible to transmit at any rate less 
than the channel capacity, but not at a rate greater than channel capacity. 
Shannon and McMillan seem to have regarded H* < H as more or less obvious, 
and devoted most of their attention to showing, under certain hypotheses, that 
H < H*. The other writers have given some attention to the inequality H* s H. 
In particular, Wolfowitz [6] obtained H* s H for channels of zero memory. 
Our result, that H* = H for indecomposable channels, extends those ob- 
tained previously. 
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3. A necessary and sufficient condition for indecomposability. To verify that 
the results to be proved in Sections 5 and 6 are valid for a given channel, we 
must show that the channel is indecomposable. The following criterion is helpful. 

TuHeoreM |. A channel (C(1), --- , C(A)) is indecomposable if and only if every 


finite product C(a,) --- C(ax) is an indecomposable Markov matrix, k = 1,2, --- , 
a; I(A). 

Proor. Suppose the channel is indecomposable and let a; ,--- , a, be any 
finite sequence of elements of (A). Consider the source with k states 1, --- , k 


with M(i,i + 1) = 1 fori < k, M(k, 1) = 1, and ¢(2) = a; . Let 
F = C(q) --- C(a) 


and let r;, re ¢ 1(R). To show that F is indecomposable it is sufficient to find 
integers 7, , T, and a state r; ¢ ](R) such that F"'(r; , 3) > O and F™(rs 13) > 0, 
that is, such that rs; is reachable from either r; or r2 under transition matrix F. 
Since the source-channel matrix L is indecomposable, the two states (k, 1), 
(k, re) have a common possible successor (7, r) which itself has a possible suc- 
cessor of the form (k, rs). Thus (k, rs) is a possible successor of either (k, r,) or 
(k, r2). Since the source has period k, the times after which (k, r;) can be 
reached from (k, r;) or (k, rz) are multiples of k, that is, there are integers 


T,, T: such that L’*((k, r,), (k, r3)) > O for i = 1, 2. But L™((k, r), (k, s)) 
= F'(r, s). Consequently F"*(r;, r3) > 0 for i = 1, 2 and F is indecompos- 
able. 

Now suppose that every finite product C(a,) --- C(a,) is indecomposable, and 


let (M, ¢) be any source. Let (d, 7), (e, s) be any two source-channel states; we 
must find a common possible successor (f, 4). Since M is indecomposable, d and « 
have a common possible successor f which is recurrent. There are then numbers 
r’, s’, such that (f, r’) is a successor of (d, r) and (f, s’) is a successor of (e, s), so 
that any common successor of (f, r’) and (f, s’) is also a common successor of 
(d, r) and (e, s). Thus we may suppose d = e = f, and must find a common 
successor (f, t) of (f, r’), (f, s’), where f is recurrent. Let fo = f, fi, --- , fia, fe = f 
be a possible path from f to itself, and let F = C(¢(fi)) --- C(@Ui)). We assert 
that if t is a possible successor of r’ with respect to /’, then (f, t) is a possible 
successor of (f,r’) in the source-channel matrix L. For L”((f,r’), (f, t)) = 
[M(fo, fr) «>> M(fe-a, f)|"F'(r’, 0), and since the first factor on the right is 
positive, the left side is positive whenever F’(r’, t) is. But since F is recurrent, 
r’ and s’ have a common possible successor ¢t with respect to F, so that (f, t) is 
a common possible successor of (f, r’), (f, s) in L, completing the proof. 

We shall say that a channel has memory m if every product C(ao) --- C(am) 
has identical rows. Thus a channel has memory m if and only if the conditional 
distribution of the present state of the channel, given the present input a, , 
the m previous inputs a, --+ , @,—; and the state r of the channel just prior to 
input ad, is independent of r for every ay, --- , a». A channel is said to have 
finite memory if for some m it has memory m. Every channel with finite memory 
is clearly indecomposable, for if F = C(a) --- C(a,), some power of F has 
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identical rows so that F is indecomposable. From Theorem 1, the channel is then 
indecomposable. That this includes, as a special case, the finite memory channels 
as defined by Feinstein [2] and Wolfowitz [6] can be seen from the following 
considerations: let the inputs to a channel be denoted by --- , X_, , Xo, Xi, °-- 
and the outputs by --- , Y.1, Yo, Yi, --- and let the probability structure at 
the channel be defined, following McMillan [4], by specifying the conditional 
probabilities of the various output messages, given the input signals. That is, we 
are given the conditional probabilities p(Y,,---, Ye| Xn, Xn+,-°-:) where 
we are now assuming that the channel is nonanticipatory and stationary. We 
assume, in addition, that there is an integer m such that 


DY n | Xn ’ Ys 1» Xn-1 ? Yn-2 ’ Xn ae: -) 


“ 


- PY» | Bik Pe -l» a ieee | Coe ie m). 


Now if we consider the finite state channel whose states consist of m-tuples of 
pairs, one member of the pairs being from the input alphabet and the other from 
the output alphabet, then the above assumption implies that this finite state 
channel is finitary in the sense described above, that is, it has the required 
Markov property. If we add the additional restriction that there is an integer 
M such that if two output messages m long, say y: , yz, are separated by a dis- 
tance M, that 


P(yr , y2|--: Xi, X0,X1,°--) 
= PW | ee oo oe ae -+)p(¥2 +R RF) 
then this finite state channel has finite memory M. 


4. A modification of McMillan’s theorem. In proving our main result, we 
shall need the following extension of a special case of McMillan’s theorem. 

THEOREM 2. Let d,, d2,--- be a Markov process with finite indecomposable 
transition matrix M, say D X D, let @ be a function from I(D) to I(A), and let 
yn = $(dn). For any sequence s ¢ I(A)“ let p(s) = P{(y.,--- , yw) = 8}, and let 
zy = pP(y,-** , yw). There is a constant h, depending only on M and ¢, such that 


(3) N™ log zy > —h 


in L, and with probability las N— ~. 

Proor. If the distribution of d; is the (unique) stationary distribution for M, 
the {yn} process is ergodic, and the theorems of McMillan [4] and Breiman [1] 
yield (3), with h as the entropy of the process. 

For any d ¢ I(D) and any event E, write P4(EZ) for P(E | d, = d). Let X = 
(Ar, °** , Ap) be the stationary distribution for M, and let Q(F) = z. AgP AE). 
The theorems of McMillan and Breiman assert that 


(4) log X Xa Zan 
N 


— —h ae. and L,(Q), 
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where pa(s) = Pa{(y.,--:, yw) = 8} and zaw = palyi,+** , yw). For any d 
for which Ag > 0, we have 


Na zaw = (Dore zen) Qld: = dls, ++: , yw). 


Taking logs, dividing by N, letting N — © and using (4) and the fact that 


Q(d, = d|y:,-+: , yw) converges a.e. (Q) to a limit which is positive a.e. (Pa) 
yields 

. log ' 

(5) 06 74" _, _h ae. Py for da > 0. 


N 


Now let d be a state for which Ag = 0, let e be any state for which A, > 0, let k 
be any integer < N and let G denote the event {d, = e}. We have 


(6) ZanPalG | yi,-** , yw) = ZaePdG|y,--- , bedpelYe, +: » Yw)- 


Since the P, conditional distribution of yy , yes. , --- , given that G occurs, is the 
same as the unconditional P, distribution of y; , ye, --- , we conclude from (5) 
that on G, ae. Pa,N log pllye,---, yw) > —h. Also, on G, ae. Pa, 
PiG\ys,°:*, yw) has a positive limit as N — © and zaPaG@| ys, --- , ye) is 
positive. Taking logs in (6), dividing by N, letting N — yields 


(7) N™ log zav > —ha.e. Pg on G. 


Since the union of the sets G obtained by varying k and e has P, measure 1, we 
conclude 


(8) N™" log zaw > —ha.e. P, for all d. 
Next let » = (m1, °-°* , wo) be any initial distribution and let P = DY waPa 


For any d for which wa > 0, we have 


(9) Md Zan = (> Ma Zan) P(d, =d iM, *** , Yn). 
d 


Taking logs, dividing by N, letting N — © and using (8) yields 
(10) N log (> ba Zan) > —hae. Py, 

from which we obtain 

(11) N™ log (>> ua zen) > —h ae. P. 


Thus the probability 1 convergence in (2) is established. Finally, to obtain L, 
convergence we note, following McMillan, that the sequence {N log zy} is 
uniformly integrable. We have 


(12) J(N,k) = | N' log zy | dP = N' D> p(s) | log p(s) | 


By 


s (k + 192A", 
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where B, is the event {k < |N log zy | < k + 1} and the sum is extended over 
those s e 1(A)™ for which k < |N™ log p(s)| < k + 1. Choose k; so that 2“! 
A <1.Fork = k we have 


(13) J(N,k) S (k + 122". 
Thus >>%, J(N, k) goes to zero as ky — © uniformly in N, and uniform in- 
tegrability is established, completing the proof. 


5. The direct half of Shannon’s theorem (possibility of transmission at every 
rate less than capacity). We shall need the following lemma. 

LemMa. Let p be a probability distribution on a finite product space X X Y. 
Write a(x) = Do, v(x, y), b(y) = Lee v(x, y), p(y |x) = pla, y) / a(x). For any 
numbers 6, \ such thatO <6 SX < 1, let 


A = ly:b(y) > 4}, B= {(a, y):p(y| x) < Af. 


For any integer M there are M points x,,-:-, «mu ¢€ X and M disjoint subsets 
Ey ,--: , Ew of Y such that 
(14) > ply| 2) < 4M6/) +2 0,) +2 plz,y) 
ytk; yea (z,y)eB 
fori = 1,---,M. 
Proor. Let X,, --- , Xo be independent random variables with distribution 
a(x). For each z ¢ 1(2M), y € Y, we define the random variable Z(7, y) = 1 if 


p(y | X,) S max;x; p(y | X;), Z(z, y) = 0 otherwise, and define 


f= Xv vty| XdZG, y). 


Then 


| 


Ef, = Do alz)E(fi | Xi = 2) = DL pla, yE(Z, y) | Xi = 2) 


z 


(15) 


IA 


D~ by +L plz, y) + 2” plz, yE(Z, y) | Xi = 2), 


yeA (z,y)eB 


where >* indicates summation over pairs (x, y) for which b(y) S 6 and 
p(y|z) = . Now E(Z(i, y) | X; = x) = 1 — (1 — u(a, y))?“", where 


u(z,y) = 2. a(v). 


tap(y|e) = plyi|z) 


For pairs (x, y) in >>*, 


6 = by) = Dale)p(y}») >’ DS a) = dulz, y), 


v P(yjw) Zr 


so that 


E(Z(i, y)| X; = xz) < 1 — (1 — (6/A))™™" S 2M5/. 
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Using this inequality in (15) yields 


(16) Ef; s by) + X play) + 2M/r = a. 
yea z,yeB 
It follows that E(> i" fi; / 2M) < a. Thus there are values of X,, --- , Xen, 
4 * . 2M -* * * . 
say 21, °°: , 220, for which )>i” fi /2M < a, where fj = fi(zi ,--- , Zou). 
Since all f7 are = 0, at least M of them, say fi, ee Sin , are S 2a. Then 


2a = _— ply | xi) where the sum is over y for which 
6 
Ply | ti,) S maxis, ply | vi). 
Denoting zi, by x; and the set of y for which 
P(y|xi,) > maxis, p(y | rs) 
by E; yields (14), and the lemma is proved. 

THEOREM 3. For any indecomposable channel, H* = H, that 1s, it is possible to 
transmit at any rate less than the capacity of the channel. 

Proor. Let (M, ) be any source and let {(d,, rn), n = 0, 1, 2,---} bea 
Markov process whose transition matrix is the source-channel matrix and with 
dy , fo having a uniform distribution on the DR states. Let x, = (dn), Yn = W(Tn). 
For any se 1(A)™ te 1(B)™ , re 1(R), write 

a(s) = P((a1,-°°:,2w) = 8), b(t) = P((m,--:, yw) = 0, 
Q(r, 8, i) = P(x, ie tn) = a (yi a yw) ” i, a = r)R/a(s), 

p(s, t) P(t, +++ , tw) = 8, (yi,-*: yw) = t) =a(s) >>, Q(r, 8, t)/R. 
According to Theorem 2, as N — 

N* log a(x, ,--: , tv) ~ —H(X) 
N" log bly, -*+ , yw) 2 —H(Y) 
N™ log p(t%1, °°: ,2n,¥%1,°°*, Yw) > —A(X, Y). 


Given « > 0, choose N so large that, with probability 21 — «, 


log p(t, - ++ tv, yrs ++ tw) — logala, --- 42") 5 ax) — A(x. y) —« 
N - ; a 
and 

log b(y~, +++ , yw) 


wee , 
V s —-H(Y) +. 


We apply the lemma to the product space U X V, where U = I(A)“, V = 


, ° , . ani r)— a—N(H(X,Y)—A(X)— 

1(B)”, with p(u, v) as defined above and 6 = 2-°%%—®, y= QrN@@-#O—0 
* ONG . , 

and conclude the existence of M = 2°”, say, points w4,-°--, us ¢ X and M 


disjoint subsets EF, ,-- , Ey of V such that 


_ p(u | v,) < 4.978 AC +8)—ax,1)—o—26) + Se. 


utB, 
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Thus for any G < H(X) + H(Y) — H(X, Y) we can, for any 8 > 0, by first 
choosing « sufficiently small (less than min (6/9, (H7(X) + H(Y) — H(X, Y) — 
@)/2) and then choosing N sufficiently large, find M = 2”° X-sequences 1 , - - - 
uy of length N and M disjoint subsets FE, , --- , Ey of 1(B)“ such that 

(17) +? p(v|ui) > 1 — B. 


veE; 


’ 


This does not quite prove that it is possible to transmit at rate G as defined above, 
since (2) requires that 
_ Q(r,ui,v) > 1 —e forall reR, 


yeE; 
that is, that for each initial state of the channel, each of the M messages can be 
correctly recovered, with large probability. This is an immediate consequence 
of (17), however, since (17) yields 


R .. (> QQ, u:,v)) > 1 


r vek,; 


so that, since Q(r, u;, £;) < 1 for all r, 7, 


Z Q(r,ui,v) > 1 — RB 
vek, 


for each r. Since 6 can be made arbitrarily small and F is a fixed number, the 
number of states of the channel, the proof is complete. 


6. The converse half of Shannon’s theorem (impossibility of transmission at a 
rate greater than capacity). 

THEOREM 4. For any indecomposable channel, H* < H, that is, it is not possibl. 
to transmit at a rate greater than the capacity of the channel. 


Proor. Suppose that it is possible to transmit over a given channel at rate G, 
») 


let « be given, 0 < « < }and let N,w,---,u,, J = 2°", Bi, --- , Ey denote 
the quantities whose existence is implied by the possibility of transmission at 
rate G. We may suppose that UZ; = 1(B)”’, since if (2) is satisfied for Z; it is 
also satisfied if ZH; is replaced by a superset. We must exhibit a source (M, ¢) 
for which H(X) + H(Y) — H(X, Y) is nearly G. Our source produces inputs 
in blocks of N by selecting one of the u; at random, successive choices being in- 
dependent. The entropy H(X) will then be precisely G. Since observing a long 
y sequence nearly identifies the corresponding x sequence, the conditional en- 
tropy H(X, Y) — H(Y) issmall, so that H(X) + H(Y) — H(X, Y) is nearly G. 

More precisely, the input source will have NJ states (n, j), with M((n, 7), 
(n+ 1,97)) = 1 forn < N, M((N, Jj), (1, 1) = 1/J for ie T(J). We define 
o(n,j) = Ujn , the nth symbol in the sequence u; . Let (d; , r~) be a Markov proc- 
ess whose transition matrix is the source-channel matrix, and whose initial dis- 
tribution is such that d; = (1, 7) with probability 1/J/, 7 ¢ 7(J) and write x, = 
o(d.), yx = W(r.). Then every x sequence of length NT’ which is possible has 
probability J~” = 2-*"° (since « < 3, u; ¥ u; for i ¥ j). From Theorem 2, 
H(X) = G. 
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To estimate H(X, Y) — H(Y), we recall some results of Shannon [5]. If x is 


any random variable assuming 7’ distinct values with probabilities p, , --- , pr, 
the number 2p log p: is called the entropy of x and will be denoted by h(z). 
Always h(x) S log T. If (2, y) are two random variables, each with a finite set 


of values, the wena r h(x, y) — h(y) is called the conditional entropy of x given 
y and is denoted by h(x | y). It equals the expected value of the entropy of the 
conditional distribution of x given y- For any function ¢ defined on the range of 
y, h(o(y)) S h(y) and h(x | d(y)) = A(x] y). 

Notice that, in the notation of Theorem 2, EF log zy = —h(y,--+- , yw), SO 
that the L, convergence in (2) implies that h(y, ,--- , yw)/N ~ HasN — o. 
Thus, in our present notation, 


h(a, °°: ,twr|yi,°':,ynr)/NT > H(X, Y) — H(Y) 
as T — «. We have 


T 
h(a , »tnrivi,°°° » Un) <¥ wi WUone41,°°° » Tnt+n | Ynt4i, °** » Ynt+n) 
a > ni (a; | by), 
=1 
where a, (wes, *** y Cweyw) and by = u; if (Ywesr, - ++ , Yweew) € Ej (we may 


suppose that UE; = 1(B)’). We estimate h(a; | b,) by the following lemma. 
Lema. For any distribution a on a product space U X& U of pairs (a, b) such 
that >>; a(a,a) 2 1 — € > 3} we have 


k(a|b) S —g(le) + € log (J — 1), 


where g(t) = t log t + (1 — t) log (1 — t),0 S t S 1, and J is the number of 
elements of U. 
PROOF OF THE Lemma. Let 8(b) = a. a(a, b). Then 
(a, b) a(a, b) 
—h(a\b) = 3(I = 0 oe 
(a |b) = 2. B(b) x B(b) = Bib) 
Now 


> a(a, b) - _ a(a, b) 








;.~6OB(b) B(b) 
_ a(b, bd), ab, b) , B(b) — a(b, b) 
~— -B(b) 8(b) ~~ B(b) 
a(a, b) le a(a, b) a(t ») = ab, by 8(b) — ald, b) 
& Bb) — a(b,b) = Bb) — alb,b) SCO YSSCS Bib) 
a(b, b) B(b) — a(b, b) 
= 0 (7 )- Bib) log (J — 1). 
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Consequently, 


a(b, b) 


—h(a|b) = x B(b)g ( B(b) 


Since g(t) is convex and > B(b) = 1, 


x B(b)g Eg = ald a(b, b)] = gl — €) = gfe). 


) - e log (J — 1). 


The hypotheses of the lemma are satisfied for (a; , b,), so that 


h(a, | bs) S —gle) + clog J = —g(e) + eNG. 
Thus 
h(a, 5 rg ie he 5 «S95 Ynr) < T( —q(e) + ENG). 


Dividing by NT and letting T — ~ yields 
H(X, Y) — H(Y) < 1 + Gi. 


Thus, assuming that transmission at rate G is possible we have for every « > 0 
and arbitrarily large NV, exhibited a source for which 


H(X) + H(Y) — A(X, Y) 2 GU — ©) + gle)/N. 


It follows that H = H* and the proof is complete. 

7. Another form of Shannon’s Theorem. Let {w,, = 1, 2, | be any 
stationary ergodic process whose variables have a finite set of values, say 7(W), 
and consider a given indecomposable channel as defined above. Shannon en 
quires whether the channel is adequate for transmitting the information pro- 
duced by the source, with large probability of correct reception. To say that the 
channel is adequate means that, for every « > 0, there is an integer No such that 
for any N = N, there are (1) a function f (the encoder) from (WwW)? to I(A)”’ 
and (2) a function g (the decoder) from /(B)’ to 1(W)* such that, for every 
initial state r of the channel, 


Wr{ a Bi} >1l-—e., 


where @ and 8 are random variables (the first NV symbols produced by the source 
and the decoded estimate for these symbols respectively) whose joint distribution 
a, is defined by 


r,{a = v,8 = v'} = Prob {(w,--- , ww) = v} ZZ. Q(r, f(v), 6) 
g(b) =v’ 
where Q(r, u, 6), as defined earlier, is the probability that the channel, when 
initially in state r, on receiving an input u, will produce output 6. The form in 
which Shannon describes his result is the following. 
THEoreM 5. An indecomposable channel of capacity H is adequate for the station- 
ary ergodic source \w,} if the entropy h of {w,} is less than H, and not ifh > H. 
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‘The idea of the proof of this result, based on Me Millan’s theorem and Theorems 
4 and 4 above, is extremely simple. According to McMillan’s theorem, the 
source w, is very likely to produce one of about 2"* sequences of length N, each 
of which has probability about 2-“*. Accordingly, to have a large probability 
of transmitting the actual sequence accurately, it is necessary and sufficient 
that the channel be able to distinguish among about 2’” different input sequences 
of length N which, by Theorem 3, it is if h < H and is not if h > H. The proof 
below simply makes this idea precise. 

Proor. From (1), for any « > 0 there is an N; such that for any N = N, there 
isa set F C 1(W)™ with not more than 2“*°* elements such that 


Prob {(w,,-+:,ww)eF} >1—«. 


From Theorem 3 there is an No 2 N, such that for any N 2 No there are 
leat J distinet sequences u;, +--+, uy in [(A)’ and J disjoint subsets 
Ey, +--+, B, of I(B)™ such that 


Q(r,u;, £j) > 1 — efor all? and r. 


If 1 — « 2h + e there are at most J elements in /’, so that there is a function 
ffrom /(W)™ to I(A)* such that f maps distinct elements of F onto distinct 
u;. With this f and with g chosen so that g(6) ¢ F, flg(6)| = u; for all 6 ¢ FP 
we have 


ria = Bi} > i(1-— ov. 


since the probability that a ¢ F is greater than 1 — ¢€ and the conditional prob- 
ability, given that a = ap ¢ F, that 8B = ap isat least Q(r, u;, Fj) > 1 — «, 
where f(ao) = u;. Thus if h < H, the channel is adequate. 


Conversely, suppose the channel is adequate. From (1), for any « > 0 there 
is an No such that for any N = N, there is a set F; C 1(W) ” such that 


Prob ity, °**, Wy) € F,} >l-—e 
(h-ON ¢ ’ . , ° 
and Prob (w,, +++ , wy) = ap < 2 ’ for all ao ¢ F, . Also, there isan N; = N¢ 
such that for every N 2 N; there are functions f, g satisfying the definition of 
“* . , . r\(N 
adequacy. Since m,} a 8 > 1 — e, there is a subset F, of J(W)”’ such that 


rjiae Fs} > 1 — We the conditional probability 


Ty a Bia a} > 1— Yeforane Fo. 


Then z-faeF,N Fe} > 1 —e— Ve, so that F; N Fe, and hence F» has at 
least 2°" *(1 — « — Ve) = J; elements. For ap ¢ F2 , define E(ao) as the set 


of alld ¢ 1(B) such that g(6) = ao. The assertion z,{a = 8!a=ao} >1—y ‘e 
is equivalent to 


(17) Q(r, f(ao), E(ao) > 1 — We. 


Note that, since the sets (ao) are disjoint, so are the elements of (ao), provided 
e < .707, which we may assume. In summary, for every « > 0 we have found an 








1220 DAVID BLACKWELL, LEQ BREIMAN AND A. J. THOMASIAN 


N; such that for any N 2 N; there are at least J,(N, €) distinct elements of 
1(A)” (namely the f(ao), ao ¢ Fs and J\(N, ©) corresponding subsets of /(B)”’ 
(namely the E(ao)) such that (17) hoids. Thus if g < h, it is possible to transmit 
at rate G, since, for sufficiently small « (< h — G), Ji(N, ©) > 2*° for all suf- 
ficiently large N. It now follows from Theorem 4 that h < //, and the proof is 
complete. 
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ON THE LIMITING POWER FUNCTION OF THE FREQUENCY 
CHI-SQUARE TEST' 
By Susgrr Kumar Mirra 
University of North Carolina* 

1. Introduction. Several authors have recently investigated the power func- 
tion of the frequency x?-test. Eisenhart [1] and Patnaik [2] have obtained large 
sample expressions for the power of the simple goodness of fit x?-test (i.e. where 
the class probabilities are completely specified by the null hypothesis). The 
more complicated case, in which the parameters occurring in the expression for 
class probabilities require to be estimated, has not received a unified treatment, 
although the problem has been treated in a number of specific situations by dif- 
ferent authors, including, Patnaik [3], Sillito [4], Stevens [5], Pearson and Mer- 
rington [6], Poti [7], Chiang [8] and Taylor {9}. 

Due to difficulties in obtaining the power function of the frequency x°-test 
in the usual manner, Cochran, in an expository article [10] has suggested the 
derivation of its Pitman limiting power [11], and he illustrated it in the case of 
the simple goodness of fit test. The concept of asymptotic power suggested by 
Pitman has also been extensively used in various other areas like nonparametric 
inference (see e.g. Hoeffding and Rosenblatt [12]) and seems to be a useful tool 
for comparing alternative consistent tests or alternative designs for experimenta- 
tion, with regard to their performance in the immediate neighbourhood of the 
null hypothesis. 

The consistency of the frequency x?-test has already been established by 
Neyman [13]. The object of the present paper is to obtain the Pitman limiting 
power of this test when the unknown parameters occurring in the specification 
of class probabilities are estimated from the sample by an asymptotically effi- 
cient method like the method of maximum likelihood, minimum x? etc. In 
section 5, we discuss a few applications of the Pitman limiting power for fre- 
quency x?*-tests. 


2. Pitman’s concept of limiting power [11]. Let H/o be a certain hypothesis 
and 3 a test-procedure for testing Ho, which determines the critical region w, 
in Ry, (the sample space of NV, dimensions), for n = 1, 2, --- , ad. inf. Let us 
assume further that 


(2.1) Naw > N, for all n, 
(2.2) 0 < lim Prob {w, | Ho} = a < 1, 


i 
now 
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and for any alternative // 


(2.3) lim Prob {w, | H} = 1 


Let }Hon} be a family of alternative hypotheses such that 


(2.4) lim Prob {wna | Hon} = B(3, {Hon}) 
exists and 0 < B(3, {Hon}) < 1. 

We call 8(5, {Hon}) the limiting power of 5 with respect to the family of al- 
ternatives {Hon}. 

This concept of limiting power derives its usefulness from the fact that, if 
3 is any other test procedure, which suggests critical regions w,, , instead of 
w, , With w,, satisfying (2.2) and (2.3), and if 


B(3, |Hon}) S B(5’, {Hon}) 
then for n sufficiently large 
Prob {w, | Hon} S Prob bwy Hon}. 


3. A theorem in frequency chi-square. Suppose that we have R = Sol, r, 
functions p;;(ay, a2, -** , a), (2 LZ, *** 4g 1,2,---,r),of8 <R-—g 
parameters a, , @2,°-** , a, such that for all points of a non-degenerate interval 
A in the s-dimensional space of the a,’s the p,; satisfy the following conditions 


(a) D> pislay , a2, °+* , a) = lforz = 1,2,--- ,q, 
fual 
(b) DiAdi» Me, *** >) > EC > for all d), 
: ‘ : , Op, Oo Di; 
(c) Every pi; has continuous derivatives of and - P ; 
. Oa Oa). Oar 
wl iy 
= . Opi; , - 
(d) The matrix D = = > is of rank s. 
Ons Rx 


(We shall assume that the index pairs (7, 7), indicating the rows of the above 
matrix or of any such matrix we define in future, are arranged in the lexico- 
graphie order. ) 

Korn = 1, 2,---, ad. inf., let (Ni", No”, --- ,N," ) be a sequence of row 
vectors such that for 7 = 1, 2,---,q, and every n, (1) N;" is a natural num- 
ber, (ii) Vi""? > Ni”, (iii) if N, = D004 NY”, then NV” /N,, = Q; independent 
of n. 


; 


Let a (a1, @,°**, a@,) be an inner point of A and let 
C530 i Bye? 9 i, 2 *** 4%) 


be a given set of numbers such that 


(3.1) Lic = 0, fori = 1,2, -++,¢. 
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Put 

(3.2) Pis Pijs(a1 , a2, +++ , a) 
and 

ac ) Cis 

(3.3) P ia = Di + - = 


JN, rn 


Let no be a positive integer such that for n 2 no 


Pin > 0 for all 1, 7. 
For n = m,m% + 1, --- , ad. inf., let {vii} (¢ = 1,2, ---,9q,7 = 1,2, °°: , ra) 
be a sequence of #-dimensional random variables such that 
q NV ny ri 
sv i Vijn 
(3.4 Prob {v,;.} = I] — I] Disn's 


i=l y} , i=l 

IT vise! 

j=l 
if v;;, are any set of non-negative integers (some of which might be zero) and 
> vijn = N§”,i = 1,2,--- Q, 


= 0, otherwise. 
Consider the system of equations: 


~ . . Vijn Na i i 0 W 
(3.5) di Dis OD 0, k = 1,2,-:- 
i=l j=l Pu Oa 


- 
” 


We shall prove 
‘THEOREM 3.1. 
(i) The system of equations (3.5) have exactly one system of solutions 

/ 


A a . . * 
a (Gin es. “*” »— ee 


, ps me afl . 
such that &, converges in probability to @& asin —> * (or, in symbols, & >" a as 
a> @ j. 





(ii) The value of x° obtained by inse rling a, = Gin Mm 
Pes s , a 
Vin — Nn Qi pis(ar, a2, -°* 5 Oe) 
(3.6 - = =. 3 —_—___ a Dison nee cnet )) 
fan: Joak NnQ: pij(ar, a, *** | @s) 


is, in the limit as n — «, distributed in a non-central x’-distribution ([2], [14]), 
with R — s — q degrees of freedom and non-centrality parameter 


\ = 87 — B(B’B) B's, 
where 
5 = 4 es = i\ 


} 


, ( ’ 
1% Pp; RX1 
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and 


= Mo) Rx 
Proor or (i). We observe that for » > d max,.; Qi | ei; 
Vijn — N,.Qpis | 2 WN, i ie NVQ p 


’ 


= (n — Vile )VN,. 
Hence, using Chebyshev’s inequality, we get 


- ( e = iin | Ser? yn Q; : wr 
Prob {\ vie — NaQepts| = aV/Ni} s Pian — Pals ¢ QsPiin 
(yn — Qi | ca |)? (n — d)’ 
Consequently, the probability that we have | v,,, — N»Qimpiin| 2 1VN, fot 
at least one subscript (7, 7), is smaller than (n — d) "os Qi>. 5 Diin =(n-—d)* 
Thus with a probability greater than 1 — (n — d) ° 


“, we have 


| Vijn — N.Q:p., < nV 'N, for all (7, 7) 
If we put 


Vijn — NiQip. 


« 


Vv ‘Nn Q, p’; 


Lijn = 
and a@ = min @Q,, this will imply that with a probability greater than 


1 — (yn — d)° 
we have 
lQ% ' n 
(3.7) | Zign | < for all (7, ;) 
ac 

The proof of Theorem 3.1 (i) can now be completed using (3.7), as well as 
assumptions (a), (b), (ec), and (d) and following Cramér’s argument ([{15], sec 
tion 30.3). 


Proor oF (ii). We put 


Vijn — N, Q; Dij(Qin ’ Gon os ten) 
Yj = 

: WN Qi pi; (Gin. & 9 °° * yilen) 
aw = {rin} m1 
Ya) = {Yiin} Rxa 


Ziny = {Ziin}rxa = Yor — (1 — B(B'B)'B'Xqy 
The proof of Theorem 3.1 (ii) requires the following results. 


LEMMA 3.1. 


2 
“ijn 


= 0 as n 
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Lemma 3.1 can be proved in a manner similar to the proof given in section 


40.3 of Cramér’s book {15}. 


nr ° ee ° >! ° ° ° . 
LeMMA 3.2. The limiting distribution of X(») is multivariate normal with mean. 


& and covariance matrix 


Ax I — Pr’ 
where ; 
P {pr Sur} re q (7 ee 4 » Ys J = Oe re 1,2,---,@) 


and 6, 18 the Kronecker’s symbol. 
A proof of Lemma 3.2 could be constructed again, on lines similar to that in 


) 
[15] section 30.1 (see also [16] p. 118.) 


LEMMA 3.3. (Cramér’s proposition 22.6 [15]). Suppose that we have yor v = 


1,2 


y Az, + 2; 


n-dimensional random variables, while A is a matrix of 


where wy, Yy and 2, are 
the distribu- 


order n-n with constant elements. Suppose further that, as v — =, 
tion of x, tends to a certain limiting distribution, while z, converges in probability 
tu zero. Then y, has the limiting distribution defined by the linear transformation 


Aw, where x has the limiting distribution of the x, . 
LemMa 3.4. The limiting distribution of yo is multivariate normal with mean 
3'(/ — B(B’B)'B’ 
and covariance matrix 
\y = [J — B(B’B)'B'\\l — PP’\\l — BCB’B)'B 
I — B(B’B)'B’ — PP’ (since B’P = 0 as may be verified) 


Lemma 3.4 is a direct consequence of the previous lemmas. 
| 


LemMMaA 3.5. There exists an orthogonal matrix L of order R-R such that 


s+q R-—s-q 
L'il — B(B'B) 'B — PP)L = - 7 5 ; ) 


R—-—s-—q 


lo prove Lemma 3.5, we write 


M(R X 2R) = (B(B’B)'B’: PP’! 


and observe that 


B(B'B) 'B’ + PP’ = MM’ 
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Since Rank [B(B’B)"B| = s, Rank [PP’] = g and B’P = 0 it follows that 

tank [M] = s + q. Hence Rank [B(B’B)'B + PP’) s + q. But 
B(B’B)"B + PP’ 


is an idempotent matrix. Hence its only nonzero latent root is 1, which is 
thus of multiplicity s + g. Therefore, since B(B’B)'B’ + PP’ is a symmetric 
matrix, there exists an orthogonal matrix 


L = (ln: LJR 
R—s—q 


+q 


such that 


L'(B(B'B)'B' + PP'‘)L =, _ : of 0 ) 
t —_ gj == q 


The same matrix D satisfies Lemma 3.5. 
If we now make an orthogonal transformation 
~/ 


v 
W (Wi nWen,°°* » Wren) ) (ny ds 


it will then follow that the limiting distribution of W/,) is multivariate normal 
with mean 


6 = o[/ — B(B'B)'BL 
and covariance matrix 


Ss 4 q R—-es-— q 


= s+ | Tt) 0 ) 
7 2 oe ge q 0 / 

1 o\{ Ls 
(Ly Lo) @ 0) | = Ly L 


I — B(B’B)'B’ = 1 — LL + PP’ 


But 


B(B’B)'B + PP’ 


Il 


Therefore 


Loln + PP’, since LL’ = LL; + Lely = I 
and 
6 = a7 — B(B'B) BL 
= 8 [LeLs + PP'\[Ly : Le] 
8'[PP'L, : L. + PP’L.] 


60: Lj, since WP = 0 
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Thus asn — ~, 


Win >’ 0, for i wi = 2, oe + q> 


and Wesgsiins Wesgta.ns °°’ » We» are asymptotically distributed as independent 
normal variates with unit variance and means given by 

. , ‘ / 

lim E(wisg41.n , Wetgten, °°» Wea) = 512 

nee 


Hence 


qg 


a es (Vijn — Nn Q, Pai (Gin . Q2n es * Gen)” 


=. > a a A 
j=l N n Q, Pi (Gin » G2n,*** » Gan) 


Rk 
7! , yt , 2 
= } (n) ) (a) = vv (n) v = > Win 


i=] 





is, in the limit as n — «, distributed as non-central x° with R — s — q degrees 
of freedom and noncentrality parameter 


\= LL 
= 8(PP’ + LALi)o 
= 8’(1 — B(B’B) B's 
This completes the proof of Theorera 3.1 (ii). It will be seen that the proof of 
Theorem 3.1 given here, follows reasoning similar to that in Cramér ((15}, 


section 30.3). An alternative proof is also possible on the lines of Wald’s deriva- 


tion (Theorem IX [{17]}) of the large sample distribution of the likelihood ratio 
criterion, with suitable modifications. 


4. The limiting power of the frequency x?-test. Neyman [13] considers the 
following problem: 


Consider qg sequences of independent trials and let N,, denote the number 
of trials in the ith sequence. Each trial of the ith sequence is capable of pro- 
ducing one of the r; mutually exclusive results, say 


Pils, Pi2, pe Pir, 


with unknown probabilities 


where 


La | 


D pi = 1 


t=] 


Denote by v,; the number of occurences of p,,; in the course of the Nj trials 
forming the ith sequence. 


On the basis of these observations {v,;;} it is desired to test the hypothesis 
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that these unknown probabilities p;; satisfy certain known functional relations, 


e.g. 


= 
. 
H:;; = Pij(ay ie. *** - Oe 

where the p,;,’s are certain functions satisfying the conditions described in sec- 
tion 3, and (a; , a2, *+* , a) is an unknown parameter point. Let a, de, --- , 
&, be a suitably chosen solution of 

q ri r 

Vij — Nw pis Op: ; 

(4.1) 2. ee oe oo & b= 1,3 +++ ,@. 

1 j=1 Pi Oa 


and let xi-a(u) be the upper @ percent point of the x’-distribution with » de- 
grees of freedom. 
For testing /7 we compute 


q ri 


9 (v3; — Ny Dij (01, 2, °°” , Os) i 
x1 we > as j I ay ae - 7 ) 


YT = o - ” 
t=1 j=l N (i) Pij (Qty 50 5 °° * 5 Gls)” 


We reject H if xz > xi-a(R — 8 — q), and accept otherwise. Put N = bm i Vi 
and Q, Nis /N. Let fei;}, 6 and B be as defined in section 3. Let F(x’, u, d) 
be the distribution function of the non-central x’ with u degrees of freedom 
and non-centratity parameter \. Define the hypothesis 


e 0 0 € 
Hi i : Pit = Dis (ay ; Os , -s) \a.) + i= = Dijn (say), 
VA 
where as before (ai ae, .ee a.) is an inner point of A. 


From Theorem 3.1, we obtain the limiting power of the xj-test 
Blxn , (Hw}) = 1 — F(xi_.(R — s — q), R — 8 — g,A) 


where A o’(/ — B(B’B) Bs. 
Let d’ (d, , ds, --+ ,d,) be any vector of real numbers. When 


Cisse 1 Dd, 


it is easily seen that 


6 0 0 Cc, . 1 
Pij (Qa, as, ,a@,) + 7 = Pij\Qin, Gey, a ~) _ o( ) 
WN VN 


where ay = a + di/v(k = 1, 2,--- , 8). In this ease 3 is of the form 
6= B-e 
where e’ (¢; , @2, *** , @) is another real vector. We have 
\ = e’B’(I — B(B'B) 'B’)Be 
= 0 


and B(xn , |Hwx}) a, as we might expect. 
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5. Applications. (1) Planning of experiments for comparing two distribution 
functions. 

To test the hypothesis that two random variables z; and z: have identical 
probability distributions, the test procedure commonly adopted consists in 
making a sequence of N; independent observations on the random variable 
«(i = 1,2). At each observation we observe the numerical value assumed by 
the random variable and according to this classify the results of each sequence 
into r measurable mutually exclusive and exhaustive groups (same for both 
the sequences). 

Let v;; denote the number of observations ’of the ith sequence belonging to 
the jth group (¢ = 1, 2,7 1,2, ---+, 9), so that > j-1 v5 = N(¢ = 1, 2). The 
hypothesis desired to be tested is equivalent to the hypothesis //* that there 
ure r positive constants pi, pz, --- , p, With }-5., p; = 1 such that the prob- 
wbility of a random observation belonging to the jth group is equal to p; for 
both the sequences. (We assume that the groups are so chosen that each of them 
has a positive probability measure at least w.r.t. one of the distributions.) 

If this hypothesis H* is true, the maximum likelihood estimates of p; will 
be given by pj = v.)/N, where v., ‘4; + ve; and N = N, + N,. Hence for 
testing the hypothesis we compute 
(5.1) “ae = = > (vis — Qiv i)” 

Ll y~l Q), vj 
We reject the hypothesis it 
une > xvi — 1), 


and accept it otherwise. 

Let us now assume that it costs C, dollars to make an observation on 2; (¢ = 
1, 2). Since both N,; and Ny» are at our disposal it seems now natural to inquire 
how best we could allocate our total sampling budget of S dollars to the two 
populations, or, more precisely, could we determine the ratio NV; / (N, + N»2) = 
(J; which will maximize the power of the above test with respect to all alternatives 
violating the hypothesis //*, and at the same time ensure that the sampling cost 
does not exceed S dollars. Due to reasons already stated earlier in this paper, 
we cannot provide an answer to this question with our existing knowledge. 
However, if we agree to accept the limiting power function as our criterion for 
choosing ‘the best’, we might seek if the best possible sampling plan exists in the 
sense of maximizing the limiting power. 

Let c;; (¢ = 1, 2, 7 1, 2,---, r) be any given set of deviation parameters 
such that 


cy = 0, it = 1, 2, and for at least one /, 
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Let us denote by H* the hypothesis 


/ 


H? : pis(S) = pj + te 
Vv 


A 


If we decide to take N; and N¢ in the ratio Q, : (1 — Q,) then the total sample 
size will be given by 


N = e. where Q. = 1 — Q:. 


Hence H? may be rewritten as 
Cij 


S self «d+ —peeeio me. 
VC1Qi + C2 Qe VN 


From Theorem 3.1, we obtain the limiting power of the xj--test 
B(x, {HT}) = 1 — FOG-a(r — 1), (r — 1), xa) 
where 
Awe = 3/1 — B(B’B)'B’)b. 
After some simplification Ay+ reduces to 
ere Os 2 (Ou — Ce" 


Since for given x and u, F(x, u, A) isa strictly monotonic decreasing function of A> 
the maximum limiting power is attained when Ag+ is maximum, that is when 


VC: 


01 = — a 
VOi+ VC: 
Thus to maximize the limiting power the best possible sampling plan, at the speci- 
fied budget, is given by 
; S 
N, = esi ; 7 Z 
VO (VO + VC) 


and 


V C2 (VO1 + VC?) 
where [a] denotes the largest integer less than 2. 
(2) Planning of experiments ‘to detect shifts in response. 
Consider the following problem discussed by McNemar [18] who was interested 
in ascertaining the effectiveness of an interpolated experience like a movie or a 
lecture in shifting individual responses to certain stimuli. Let us take the simple 
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situation in which every individual responds to the stimuli in one of two different 
ways (say, ‘0’ or ‘L’). Let 2;; denote the proportion of individuals in the popula- 
tion, who give response ‘7’ before the interpolated experience and response ‘j’ 
after it (¢ = 0,1;7 = 0, 1). 

Write 


wT. = wo + Ta —f — 0) 1) 


Tj = ®oj + 71; (7 0, 1) 


We shall say there is no shift in response if 


Ho: Ty. tT 
is true. 
To test this hypothesis one can conceive of at least two alternative ways of 
experimentation: 

(a) two samples, each of size n are selected independently, one from the pre- 
experience group and the other from the post-experience group. The test for the 
equality of proportions then, is easily seen to be a particular case of the test given 
earlier in this section under Application (1). Let us denote the chisquare obtained 
for this test by xé. 

(b) the same set of individuals, n, in number selected from the pre-experience 
group is again examined after the experience, and the results classified in a 2 « 2 
table as follows: 


Post experience response 


0 1 total 
Pre-experience response 
0 Noo ron no 
l ni niu ny 
total n¢ ny n 


Under procedure (b), to test Hy, we compute 


: (nip — Mo)” 
Xb = -_—— —— —E 
Noa + No 


and reject Hy, only if, xs > xi-a (1). Let us denote by Ho, the hypothesis 


( C; 
Hon 2 ®ij = 8ij + — 


» 0 0 , 5 - . rn 
where Yai; = 1, mo. = mi = 7 (say), Ley; = O, and cm ¥ cw. From Theorem 
3.1, we obtain after certain algebraic simplification: 


B(x, {Hon}) = 1 — F(xi_e(1), 1, Xo) 
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and 
B(xs, {Hon}) = 1 — F(xi-a(1), 1, do), 

where 

____ (eo = eas)" 

“2G + 2’) (ard0 + 2’) | 
and 
an os — . 

The denominator in A, can be rewritten as 2 {x — x + moor}. Hence, >, 


< or = Ag, according as (xorn — totw) >, < or = 0 respectively. This 
shows that at least from the point of view of maximising limiting power, proce- 
dure (b) would be superior to procedure (a) when the association between the two 
response types, as measured by SeeT un — Tortie, is positive; inferior to (a) when 
it is negative; and equivalent to (a) when it is zero. 
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SOME EXACT RESULTS FOR THE FINITE DAM 
By N. U. PraBiiv 
University of Western Australia, and Karnatak University, India 


1. Summary. In the discrete finite dam model due to Moran, the storage 
process |Z;} is known to be a Markov chain. Stationary distributions of Z, are 
obtained for the cases where the release is a unit amount of water per unit time, 
and the input is of (i) geometric, (ii) negative binomial and (iii) Poisson type. 

The paper concludes with a discussion of the problem of emptiness in the finite 
dam and considers the probability that, starting with an arbitrary storage, the 
dam becomes empty before it overflows. 


2. Introduction. This paper is concerned with a storage system whose prob- 
ability model is due to Moran |9]. The storage Z, of a dam of finite capacity K 
is defined for discrete time ¢ (ft = 0, 1, 2, +--+) as the dam content just after an 
instantaneous release at ¢, and just before an input X, flows into it over the time- 
interval (t, ¢ + 1). The model is subject to the conditions that 

(i) the inputs Y, during the intervals (¢, ¢ + 1) are independently and iden- 
tically distributed ; 

(ii) there is an overflow Max(Z, + XY, — A, 0) during the interval (¢,¢ + 1), 
a quantity Min(K, Z, + N,) being left in the dam just before the release occurs; 
and 

(iii) the amount of water released at time t + 1 is Min(M/, Z, + NX,) where 
M is a constant (<K). 

A fuller description of the model and further references on the subject are given 
by Gani [3]. It is seen the stochastic processes |Z} and |Z, + X,} are both 
Markov chains, and the problem of obtaining their stationary distributions, 
given the probability distribution of the input, is of some interest. Moran ({9}, 
{10}) and Gani and Moran [4] have obtained a few approximate solutions to this 
problem by numerical methods, and some important observations on the solu- 
tion in the general case have been made by Moran [11], but the only exact solu- 
tion known so far is the one due to Moran [10] for the case of the geometric input. 
The problem is considerably simplified when AK = ~, i.e. when the dam is of 
infinite capacity; it is then seen (Gani and Prabhu, [5]) that the transition- 
matrix of the Markov chain |Z, + X,} also occurs in the theory of queues in 
connection with the length of a queue at epochs just before service. For this case 
Bailey [1] has obtained, by the method of probability generating functions 
(p.g.f.), the stationary distributions arising from a given distribution of Y,. 
A dam of finite capacity AK can be considered as the analogue of a queueing sys- 
tem in which there is accommodation for only AK customers to wait, those in 
excess of K being compelled to leave the queue altogether (as may happen, for 
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instance, in an airport of limited capacity); we proceed to obtain for such a dam 
the stationary distribution of the storage Z,. 


3. Stationary distribution of the storage. We shall be concerned with the case 
where M, the amount of water released at time ¢, is unity. Let {g;} be the prob- 
ability distribution of X,, so that 


(1) Pr {X, = j} = gj, (y = @) 1,2, --<). 
We assume that g; > 0 for all j. Also, let 
(2) Gz) = Deg; Jz) <1 
j=0 
be the p.g.f. of {gj}, and 
(3) p= G(1) = Do jg, 
j= 
the mean input. The transition-matrix of the Markov chain {Z;} is P = {P;,}, 
where 
p 0 | K-2 K-11 
0 got ge °° GK-1 hx 
(4) | ao qi dk Nn-a 
P 2 0 go °-°° Yx—s hrs 
kK — | 0 0 [aa Jo hy 


where h; = j= gi, (2 1, 2,---, A). Clearly, the chain is irreducible and 
contains 2 finite number A of states, so that the stationary probability distribu- 
tion ju}, @ = 0, 1,---, KA — 1) exists, where the u, are the unique solutions 
of the equations 


K—\ 


(5) “u; = Zz bs Di; ; (7 = 0,1,---,K — 1) 


a) 


together with b oR ‘u; = 1. We first prove the following theorem due to Moran 
[11]. 


‘THEOREM. 
. K . » ‘ ns . . ° 
(i) Tf tus}, (2 0, 1, +--+, A — 1) ts the stationary probability distribution 
of storage in a dam of capacity K, then the ratios 


are independent of Ky, and 
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(ii) the v,’s can be found as the coefficients of z' in V(z), where 


~ SR go(1 — 2) 
(4) o(z) = G(z) — z 


The first part of the theorem is easily proved; infact, writing out the equations 
(5) in full we obtain 


uo = (go + gi)to + dom 
My = goto + Gita + Goll, 


Ux2 = Jx-ilo + gros + -°* + golx-1 


x1 = Neto + Aeiats + -+* + hyux 1 
Solving these equations successively for the ratios v; = u; / ue we obtain 
1 — gw —% 
y= — 
Jo 
(8) 
I-H  . & 
v, = —— 9 + 
go Yo 
and in general, the v;’s (¢ = 1,2, +--+, A — 1) are seen to be independent of K. 


Now consider the function V(z) defined by (7). We shall first prove that V(z) 
can be expanded as a power series which is convergent for suitable values of | z |. 
Let us first consider the case p S 1. Writing 

J 1 — G(z)\ 


G(z) — z = (1 — z)<1 


and following Iendall ({6], p. 159) we obtain 


1 — G(z) ~ .< 
~= 22D 9 


| = 2 new n+1 
so that, for |z| < 1, 


1 — G(z)| << 
= <LDo. 


x 


D igi = pS 1 


] —_ @ n=) n+1 i=l 
Hence |G(z) — 2 # 0, and we have the power series expansion 
si 1 — G(z)\"" ’ 
V(z) = g<l -— i Gwe) =m tyztmes te: 
convergent for ,2) < 1. 


Next, let p > 1. In this case there exists « positive \ such that the power 
series expansion 


1 


a3 * "te as + +> 
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is valid for |z| < A (Knopp, [8], p. 182). Hence it follows that V(z) also possesses 
‘t power series expansion convergent for |z| < X. 
Thus whether or not p S 1, V(z) has a power series expansion 


9) V(2) = Wl = 2) 


=" =u tnztmz + 
G(z) = 2 


The coefficients v, are determined from the relation 


x 
g(l — z) = {G(z) — 2} Day 2 
i= 
and hence it is seen that ry 1, and vy, v2, °+* , U¢-1 Satisfy the relations (8). 
Thus they are, in fact, the quantities defined in (6). 

If p < 1, the stationary probability distribution exists in the case of the in- 
finite chain (K x), and its g.f. is proportional to V(z). However, the above 
results hold, as we have shown, even when p = 1. It is now obvious that the gen- 
eral method of obtaining the stationary probability distribution {u;} for the dis- 
crete dam of finite capacity A consists of (i) finding V(z), (ii) expanding V(z) to 
obtain the v,’s, and (iii) normalising vo , 0), +++ , U—, to obtain a probability dis- 
tribution. We proceed to do this in some particular cases. 


3.1. Geometric input. Consider, for instance, an input distribution of the 
geometric type, 


(10) q Pr |X, = j} = ab’, (j = 0,1,---,) 


where 0 < a <1 and b 1 — a. The p.g.f. of X; is then 


a 
(11) G(z) = 
p 1 — bz 
and the function V(z) is given by 
: a(l — z 1 — bz 
J (z) = ) ms jounat 
afl —6bz)*—-2 I1l—g 
(12) 
—~ icf, — . 
= (1 — bz) pz {|z| < min , vad. 
iO p 
where p = b/a is the mean input. Hence we obtain 
vo = 1, v p' — bp’ = bp’, (@@ = 1,2,---,K — 1) 
snd 


K—1 K—1 1] a 
r vy; = I + b Zz. p =a p 
0 i=l 


= 2 
The stationary distribution in this case is therefore given by | u 
(1 — p) _ pe — p) 


(13) uw = My - 
a(l — pX*") 1 — px? 


\, where 


; (¢ = 1,2,---,AK — 1). 
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Thus the storage of a dam of finite capacity A into which flows an input of the 
geometric type has a stationary distribution of the geometric type, which is 


truncated at Z = AK — 1 and has a modified initial term. This result is implied 
in Moran’s solution (referred to in Section 2) for the general case M > 1, al- 
though it is not explicitly mentioned by him; for M = 1 his solution is given by 
the formulae wo = m + m, Ui = Tir (0 1, 2,---, K — 1), where 
ro lo 2 r—2 0 3 
Wr-/TK = Si = So a oe S3 Se 


(14) as n— 1 ‘ n— 2\ .1—n : 
\ S, = ¥* ~. (r = 1,2,---, 
‘ c ~ i)! c i)! dithe ” 


From this we obtain (13) after some simple reduction 


3.2. Negative binomial input. Consider next the more general case of the nega- 
tive binomial input, 


_ ‘ —1\, 
(15) gi = Pr {X, =j} =n; (* tJ a" ; (7 = 0,1, ---) 
J 


where 0 < a < 1, b = 1 — a, and n is a positive integer; the p.g.f. of X, is 
then 


a’ 


G(z) = 
(1 — bz)” 


and the mean input is p = nb/a. We have then 


(16) (>) a'(1 — 2) a’(1 — z)(1 — bz)" 
) z) = = 


a"(l — bz)-" — z a" — 2z(1 — bz)" 


Obviously z = 1 is a zero of the denominator of the expression on the right hand 
side of (16); in addition to this it has n other zeros 2; , Z2, +--+ , Z,. We consider 
here the case where 2, 22,-*:, 2, are all distinct and different from unity; 
however, the general case can be treated along similar lines. When (1, 2; , 22 


- , z,) are all different we can break up V(z) into partial fractions of the form 


(17) Viz.) =a +>, ; & 
p=1 Z 


a/ ep 


where obviously dy) = a" and the d,’s are given by 


d, = lim (1 ~ * V (z) 


p 


(18) a’(1 — z)(1 — bz)" (1 — a - a’(1 — 1/2”) 
E = lim — 


z+2y a" — 2(1 — bz)" 
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Now let d be the least among the quantities 1, | 2: |, | z2|, +--+ ,|2n|\;thenfor|z| <A 
we can express each term under the summation sign in (17) as a power series. 
Thus 


Viz) = dh +> dD (2) =a&+> 27> a, (+) ; 
1 Zp p= Zp 


p= =) ime) 


whence we obtain 


vo = do + > d, = lim V(z) = 1 
(19) . p=!l z+0 
,.=>, dy (¢ = 1,2,-°-,K— 1), 


7 ’ 
p=1 (Zp) 


so that 


K—1 n K—1 1 ‘ n 1 za. (1, zy)" 
> v; = do + z d, 2. = dy + > dy = — 
i= p=l i=d \Zp p=! 1 — (1/25) 


It follows that the stationary probabilities u,; are given by 


ll 


: 1 — (i/z,)*\" 
< d :. een 
. + dd, 1 — (1/2,) J 


(20) 


Us = U Z a,(2 


p= Zp 


}, (@@ = 1,2,--- K — 1). 
From (20) we see that the stationary distribution of the dam storage is the 


weighted sum of n geometric distributions, each of which is truneated at Z = 
K — 1, and has a modified initial term. 


3.3. Poisson input. Finally we consider the case where the input has the 
Poisson distribution with mean p, 


(21) g, = Pr {X, =j} =e°2%, (j = 0,1,---). 


The rigorous procedure here consists of writing down V(z) and obtaining the 
coefficients v; by complex variable methods. We shall, however, argue heuris- 
tically and consider (21) as the limiting case of the negative binomial (15) as 
n— ©,a—1 and p = nb/ ais held fixed. In fact, putting a = 1 / (1 + p/n), 
b = p/(n + p), it is seen that the p.g.f. of (15) reduces to 


( es | 1 1 ie \-" seule 
\ n n } 


which is the p.g.f. of (21). Also, d) — e~’, and 


ax(L = Wz) "(1 — 1/29) 


_ 
7 paz,/(1 — bzy) —1 p2p — 1 
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where 2, , 22, --+ are the roots (other than unity) of the equation 
err ms 
(which are infinite in number). Hence the stationary probabilities of the dam 
storage are given by 
f Ss @ (1 —1/z,) 1 — (1/z9)"\* 


i = . + p» 


p=l 0 pp — 1 1 — (1/25) J 


—z. - a= a (2), (Gj = 1,2,---,K — 1). 


p=l oe Z 


= 
- 


Re 

4. The problem of emptiness in the finite dam. The analogy between the 
dam process and the random walk has already been pointed out by several au- 
thors (see the discussion in [3]). In facet, putting U, = X, — 1, we see that the 
storage Z,; in a dam of capacity A satisfies the relations 


Zit U; fO<2.+U,<K-—1 


(24) Desi = <0 if Z2.+0U,.350 
\K —1 f# 44+U2K-1 


which, however, define a random walk with impenetrable barriers at Z = 0 
and Z = K — 1. If K = ~, there is only the first barrier and the problem of 
‘duration of the game’ (i.e. the distribution of time required for the dam to 
become empty for the first time) has been discussed by Kendall [7] for the case 
where the input is of the Gamma type and the release is continuous. lor finite K 
this problem is much more difficult; however, for this case we propose to dis- 
cuss the probability of absorption at Z = 0 (i.e. the conditional probability V, 
that, starting with a storage Z) = 7, the dam becomes empty (Z, = 0) before 
it overflows). This is a familiar problem in random walk theory, and has been 
discussed, for instance, by Feller ({2], pp. 300-303); it is seen that the probabil- 


ities V; (¢@ = 1,2,--- , K — 2) satisfy the relations 
K—2 
MN = Lai Vi +o 
jond 
K—2 
V -> g;-in V;; (@@ = 2,3,°-:, A — 2). 
j=i—l 
‘These equations simplify to some extent if we note that the states 0 and K — 1 
are absorbing, so that Vo = 1, Vx_1 = 0; for we can then write 
K-1 
(25) V; = 9j-i+1 V; ; (7 == l, ios es K _ 2). 
j=i-l 


Clearly, the coefficients on the right hand side of these equations correspond to 
the rows of the transition-matrix (4). It will now be found easiest to start at the 
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bottom right hand corner and work up to the left: thus 
goV x 3+ MV x-2 + ho-0 = Ve» 
so that 


‘ 1- ; 
Ve-3 = nies ay, 


Jo 


and similarly 


q - nn) Vix-s =. Ve 3 
Jo : 


Veu = 
etc. This shows that the ratios of the quantities 
(26) wy = Vx Bpneg 


are again independent of K (wo = 0, we, = 1); rewriting the equations (25) 
in terms of these quantities we obtain 


(27) w, = gj wi-jn, (Gi = 1,2,-+-,K — 2). 
j=l 
Consider the system of equations (27) for 7 = 1, 2, --- ad. inf., and put 
(28) Woe) = De 
i=l i 


we have 


2W(z) = 2. ce ae, iW 


i=l Wy j= 


“ < w “ w 
i—j+l 3 i+1 ¢ 

=e Vor rtnL—z 

j=l ij uy i=l Wy 

“ = ww = w 

‘ i+F—i $ =i 

=292-—-2 + 9 22 

j=l i=] Ww, im? Wy 


= G(z)W(z) — go, 


whence we obtain the relation 


(29) WO «2 —. 
) G(z) — z 


Folowing the same lines of argument as for V(z), we can prove that W(z) can 
be expanded as a power series convergent for suitable values of | z|. Let W(z) = 


t-0 W412; then since we) = Wx. / wi = 1/w;, we must have 
w; ° P » 4 
(30) wW; = - ; (@ = 1,2,---,K — 2) 
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which are, therefore, the required solutions to the equations (27). The absorp- 
tion probabilities V; can then be obtained from (26). 

Let us now consider the particular case where the input is geometric with 
probabilities g; = ab’, (j = 0, 1, 2,---), and G(z) = a(1 — bz)"; then (29) 
gives 





s 1 — bz) 
W(z) = = — 
@) ail — bz)? —2z (1 — 2)(1 — pz) 
( a 1 p } = 
31 in gel f 
(31) fick —— «(OP 
ae ns 
Hence it follows that 
( a t+1 
i oe tant 
€ 9) = < ca 
(32) — 7 (i ¥ 1) 
\a(i + 1) ifp = 1 
and 
1+1 
t pr if p + 1 
(33) wi = 4 (¢=m1,2.---,K — 2). 
)! > . tee 


Thus the absorption probabilities V; in the case of the geometric input are given 
by 


li-,~ es 

re if p ¥ 1 ’ . 
(34) Vi = : (i = 1,2,---,K — 2). 

i. ifp=1 

{ K 


A similar procedure could be used, when the input is of a more general type, 
to obtain the exact expressions for the probabilities V;. However, in many 
cases, it may suffice to know the bounds within which V; lie, and these bounds 
are given by Feller ((2], inequalities 8.11 and 8.12 on p. 303). In fact, noting that 
E(U,) = E(X; — 1) = p — 1, where p is the mean input, we have that 





 a-1 _ ,f 
_-— <V;s1 ifp<1 
20 — 1 
eons -— 
(35) _ eA SVis% ifp>t 
= 
ime £9,81 ifp =1 
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where zp is the unique positive root (other than unity) of the equation D2 
Pr{U, = j} = 1, ie. DoFogjz’ = z, and z 2 1 according as p $ 1. 


‘5. Concluding remarks and acknowledgements. When the input X;, has a 
continuous probability distribution, it is seen that the stationary distribution 
function of Z, + X, satisfies an integral equation, which has been solved by the 
author in a recent paper (Prabhu, [{12]) for the special case when the input dis- 
tribution is of the Gamma type. A more realistic problem on which some work 
is in progress at the moment is the one dealing with the finite dam process in 
continuous time; however, our solutions for discrete time may be taken as useful 
approximations to this continuous case. 


I am indebted to Dr. J. Gani and the referee for many helpful suggestions. 
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MINIMAX ESTIMATION FOR LINEAR REGRESSIONS' 


By R. RADNER 


University of California, Berkeley 


1. Introduction and Summary. When estimating the coefficients in a linear 
regression it is usually assumed that the covariances of the observations on the 
dependent variable are known up to multiplication by some common positive 
number, say ¢c, which is unknown. If this number c is known to be less than some 
number k, and if the set of possible distributions of the dependent variable in- 
cludes “enough” normal distributions (in a sense to be specified later) then the 
minimum variance linear unbiased estimators of the regression coefficients (see 
{1]) are minimax among the set of all estimators; furthermore these minimax 
estimators are independent of the value of k. (The risk for any estimator is here 
taken to be the expected square of the error.) This fact is closely related to a 
theorem of Hodges and Lehmann ({3], Theorem 6.5), stating that if the obser- 
vations on the dependent variable are assumed to be independent, with variances 
not greater than k, then the minimum variance linear estimators corresponding 
to the assumption of equal variances are minimax. 

For example, if a number of observations are assumed to be independent, with 
common (unknown) mean, and common (unknown) variance that is less than 
k; and if, for every possible value of the mean, the set of possible distributions of 
the observations includes the normal distribution with that mean and with 
variance equal to k; then the sample mean is the minimax estimator of the mean 
of the distribution. 

The assumption of independence with common unknown variance is, of 
course, essentially no less general than the assumption that the covariances are 
known up to multiplication by some common positive number, since the latter 
situation can be reduced to the former by a suitable rotation of the coordinate 
axes (provided that the original matrix of covariances is non-singular). 

This note consideres the problem of minimax estimation, in the general “linear 
regression”’ framework, when less is known about the covariances of the observa- 
tions on the ‘dependent variable’ than in the traditional situation just de- 
scribed. For example, one might not be sure that these observations are inde- 
pendent, nor feel justified in assuming any other specific covariance structure. It 
is immediately clear that, from a minimax point of view, one cannot get along 
without any prior information at all about the covariances, for in that case the 
risk of every estimator is unbounded. In practice, however, one is typically 
willing to grant that the covarainces are bounded somehow, but one may not 
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have a very precise idea of the nature of the bound. One is therefore led to look 
for different ways of bounding the covariances, in the hope that the minimax 
estimators are not too sensitive to the bound. 

Unfortunately, in the directions explored here, the minimax estimator is 
sensitive to the “form” of the bound, although once the form has been chosen 
the minimax estimator does not depend on the “magnitude” of the bound. This 
result thus provides an instance in which the minimax principle is not too ef- 
fective against the difficulties due to vagueness of the statistical assumptions of a 
problem, although this is a type of situation in which it has often been successful 
(see Savage in [4], pp. 168-9). 

In this note, two ways of bounding the covariances are considered. The first is 
equivalent to choosing a coordinate system for the ‘‘dependent variables,’’ and 
placing a bound on the characteristic roots of the matrix of covariances of the 
coordinates, in terms of one of a certain class of metrics (e.g., placing a bound on 
the trace on the covariance matrix, or on its largest characteristic root). The 
second way consists of choosing a coordinate system, and then placing a bound 
on the variance of each coordinate. 

In the first situation, the minimum variance linear unbiased estimator cor- 
responding to the case of uncorrelated coordinates, with equal variances, turns 
out to be minimax; this minimax estimator is, in general, different for different 
choices of coordinate system, but does not depend on the “magnitude’’ of the 
bound. Also, the minimax loss typically decreases at the rate of the reciprocal of 
the sample size. 

In the second situation, the minimax procedures derived here involve ignoring 
most of the observations, and applying a linear unbiased estimator to the rest. 
Again, the minimax procedure depends upon the choice of coordinate system; 
furthermore, in this case the minimax loss typically either does not approach 
zero With increasing sample size, or does so much more slowly than the reciprocal 
of the sample size. 

Thus the minimax estimator appears to be less unsatisfactory in the first 
situation than in the second, but in both cases it depends upon the choice of 
coordinate system, which is a disadvantage if there is no “natural’’ coordinate 
system intrinsic to the regression problem being considered. 

Section 2 below presents the formulation of the problem, and a basic lemma. 
Sections 3 and 4 explore the two ways of bounding the covariances just men- 
tioned. Some examples are given in Section 5. I am indebted to R. R. Bahadur, 
L. J. Savage, and G. Debreu for their helpful comments. 


2. Problem formulation and a basic lemma. Let y be a random N-dimensional 
column vector, with a distribution p that is known to be in some family P of 
distributions. Let m, = Ey denote the mean of the distribution p, and suppose 
that one is required to estimate the value of f’m, on the basis of a single observa- 
tion on y, where f is given. It is assumed that the loss due to incorrect estimation 
is the square of the error. In this note minimax estimators of f’m, will be de- 
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rived under two different assumptions about P; both assumptions have the 
following form: 


Let T be given V X M matrix; let C, denote the covariance of 7p, i.e., 
C, = E(y — m,)(y — m,)’; 
and let H be a given set of N X N covariance matrices. 


(2.1) For every p in P, the mean m, = Tx for some M-dimensional vector z, 
and C, is in H. 


t 


(2.2) For every 2, and every C in H, there is a normal distribution in P with 
mean Tx and covariance C. 


The assumption that P includes normal distributions is a natural one, since 
normality can rarely be ruled out as preposterous. 

If a is any estimator, then the risk, or expected loss, associated with using a 
is, for any p, given by 


(2.3) r(a, p) = Ela(y) — f’m,}° 
= Ela(y) — Ea(y)|’ + [Ea(y) — f’m,)’. 
An estimator @ is minimax if, for every estimator a, 


sup r(@, p) S sup r(a, p). 
peP per 


Because of the convexity of the risk function, it is not necessary to consider 
randomized estimators (see [3], Theorem 3.2). 

Relative to a given covariance C, an estimator a is said to be minimum variance 
sinear unbiased, or more briefly, Markoff, if 


(2.4) a(y) = a’y (linearity). 

(2.5) For every p in P, Ea’y = m, (unbiasedness). 

(2.6) If is any estimator satisfying (2.4) and (2.5), then for every p in P with 
covariance C, r(a, p) S r(8, p) 


The significance of the Markoff estimators in this problem is that, in both 
cases considered in this note, there is a Markoff estimator, relative to some C 
in H, that is minimax. 

It follows from (2.1) that a linear estimator a is unbiased if and only if T’a 
T’f; and from (2.3) that the risk for a linear unbiased estimator is a’C,a. There- 
fore, a linear estimator a is Markoff relative to C if and only if it minimizes a’Ca 
subject to the constraint T’a = T’f. 

It might be noted here that it follows from (2.3) that the standard definition 
of a Markoff estimator given above is equivalent to another one in which condi- 
tion (2.5) (unbiasedness) is replaced by the following (bounded risk): 








MINIMAX ESTIMATION 1247 


(2.5') The risk E(a’y — f’m,) is bounded as p varies in the class of all p in ? 
that have covariance C, for any given C. 


The idea of replacing the constraint of unbiasedness by the constraint of bounded 
risk is close to the minimax spirit, and seems to be due to L. J. Savage. 

The main tool that will be used is the following lemma, which is closely related 
toa theorem of Hodges and Lehman ([3], theorem 6.5), and is stated here without 
proof. 

Lewma. If 4 is Markoff relative to C in H, and if @'Ca < C4 for every C in H, 
then @ is minima. 

In the “classical” situation to which the general Markoff theorem on least 
squares is applied (see, for example, Aitken |1]), it is assumed that the covariance 
of the distribution p is known up to multiplication by a positive constant, i.e., 
that the covariance is eC, where C is known but c is not. If it is further assumed 
that ¢ is bounded by some number k, then it follows immediately from the Lemma 
that the Markoff estimator relative to kC is minimax. Note that the Markoff 
estimator is independent of /:. 

On the other hand, if nothing at all is known about the covariance of p, i.e., 
if H is taken to be the class of all N X& N covariance matrices, then the risk for 
every estimator is unbounded. To get a finite minimax value, the class H must 
be “bounded” in some sense, and the next two sections explore two directions 
in which such a bound can be defined. In each case it should be borne in mind 
that postulated assumptions are thought of as applying after, possibly, an ap- 
propriate transformation of the coordinate system. 


3. The case of bounds in terms of characteristic roots. In this section minimax 
estimators are derived for the problem formulated in Section 2, when the co- 
variances are bounded in certain ways in terms of their characteristic¢ roots. 

lor unyv covariance matrix C, let r; denote its characteristic roots (these will 
be non-negative real numbers). For any number g 2 1, the g-norm of C is de- 
fined here to be 


Forg = 1,2,and *, one gets the trace of C, the square root of the sum of squares 
of the elements of C, and the largest characteristic root of C, respectively. Note 
that for the identity matrix /, V(Z; q) N' 

TuroremM 1. Let q and k be given such that] S q S % and k > 0, and let H be 
the set of all covariances C such that N(C; q) S k; then for the estimation problem 
described in Section 2, the Markoff estimator & relative to the identity matrix” is 
minimax, and the minimax loss is ka’a. 


Proor. The idea of the proof is to show that the covariance of rank one that 


? Strictly speaking, relative to the identity times an appropriate constant, since the 
identity may not be in H. 
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concentrates all the variance in the direction of f’y is least favorable. Let B = 


Ga’ /dé’a. Note that N(B; q) = 1. Since 4 is that unbiased linear estimator with 
minimum length, any unbiased linear estimator is of the form @ + d, where 
a’d = (0. Hence for all unbiased linear estimators b, 


b’Bb = @Baé = a; 


in particular, @ is Markoff with respect to B, and to kB. 
Let C be any covariance in H, and let r be its largest characteristic root; then 


(3.1) avCa S i@’é = N(C; ~)@G S N(C; q)d'G S ka’ Ba. 


The theorem now follows from the lemma, equation (3.1), and the fact that 4 
is Markoff relative to kB. 

For the case qg = 1, it can be shown that the minimax estimator is not unique, 
but it is not known whether it is unique for g > 1. However, the Markoff esti- 
mator @ of Theorem 1 is the only linear minimax estimator, which can be seen 
as follows. A linear minimax estimator d must have bounded risk, and therefore 
must be unbiased. Suppose d is different from 4, and let D = dd’, d’d; then 


kd'Dd = kd'd > ka’a, 


.e., the risk for d against the covariance kD is greater than the minimax risk. 

Note that it follows immediately from Theorem 1, that if the characteristic 
roots of the covariance matrices in H are defined relative to any fixed symmetric 
positive definite matrix Q, then the Markoff estimator relative to Q will be 
minimax. 


4. The case of bounds on the variances of the coordinates. In this section 
minimax estimators are found for the problem of Section 2 in the case in which 
the class H of covariances is delimited by bounding the variances of given linear 
functions of the random vector, in other words, by choosing a particular co- 
ordinate system and bounding the variances of the colrdinates. 

THEOREM 2. Let hy, +++ , kw be N given positive numbers; let H be the set of 
covariances C such that c;; S KR fori = 1,---, N; then any @ that minimizes 
> k; | a; | subject to T, = T'f (unbiasedness) is a minimax estimator for the prob- 
lem of Section 2, and C= (>. hk | a; ) ? is the minimax loss. 

Proor. There is no loss of generality in assuming that k; = 1 for every 7. As in 
Theorem 1, one is led to look for a least favorable covariance matrix among those 
of rank 1. 


Let U be the set of linear unbiased estimators; for any C in H and b in U’, 


(4.1) UCb = Dd di bye; < DO | bad; | (cives)' S$ DC |b. d;| = (= |b, ) 


7] wy . 
Let 4 be any vector that minimizes }>; | a; | in U, andlet é = }5; | @; |. By equa- 
tion (4.1), and the lemma, the present theorem is proved if a vector é can be 
found such that (1) @ is Markoff against 2 = é¢’; (2) B is in H, ice., é = 1 for 
every 7; and (3) the risk for 4 against E equals @. 
To this end, let S be the set of all vectors b such that >: lb:| Se Sisa 
bounded convex polyhedron, and the intersection of S with U is contained in 
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the boundary of S, by the definition of c. Hence there is a hyperplane supporting 
S that contains U’, 1.e., there is a vector é such that 


(4.2) b’é = ¢, for all bin U, 
(4.3) b’é < é,forallbin S 


(see, for example, [2], p. 4). 

By (4.2), e satisfies conditions (1) and (3) above. By the definition of S, any 
vector with one coordinate equal in absolute value to é, and all other coordinates 
zero, is in S. Hence, by (4.3), é| é;| S é, for every 7, so that é; < 1 for every 
1; thus condition (2) above is also satisfied, which completes the proof. 

Note that Theorem 2 characterizes all the linear minimax estimators, which is 
easily seen by an argument similar to that which follows Theorem 1. 


5. Examples. 

1. Suppose that the random variables y, +--+ , yw each have the same mean 
r, Which is to be estimated, and assume that the sum of the variances of the y; 
is not greater than k. To apply Theorem 1, Take 7 to be the N X 1 matrix 


whose elements are all equal to 1, f to be vector for which >- fi = 1 (eg, 
(1,0, ---,0}), and q = 1. It follows that a minimax estimate of f’m, = x is the 
arithmetic mean of yy, --- , yy, Le. d= (1/N,--- , 1/N), and the minimax loss 


isk 2 a; = k/N. This minimax estimator is, of course, the Markoff estimator 
for the situation in which it is known that the y; are independent, with equal 
variances. 

The same result would be obtained if it were assumed that the variance of any 
linear combination > diy such that > bi = | is not greater than / (the case 
q 2 ), 

2. Consider the estimation problem of Example 1, except now assume that 
the variance of y; is not greater than ki ,2 = 1,---,N. By Theorem 2, a mini- 
Imax estimator is given by 


‘1, for that 7 for which k; is minimum, 


(3.1) gd; =< : 
.0, otherwise, 


and the minimax loss is min, k; . Note that in this example the minimax loss is 
independent of the sample size N, except insofar as min,;k; depends upon NV. If 
hey -++ = ky, then any linear unbiased estimator is minimax. 

4. Suppose it is required to estimate the slope e in the linear regression of one 
variable on another, and it is assumed that the variance of the “dependent 
variable” is not greater than k°. To apply Theorem 2, take 


iets. 
oe ’ ’ ‘ 2’ = >) 
7 : nat ss ind r (d, ¢ 


where ¢, , --+ , ty are the values of the “independent variable,” and d and e are 
unknown. A bounded risk (unbiased) linear estimator a must satisfy 


> a: = 0, 
> at, =]. 
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By Theorem 2, any 4 that minimizes > | a; | subject to equation (5.2) is a mini- 
max estimator of ¢. Without loss of generality, fy can be taken to be the largest 
value of ¢; , and ¢,; the smallest; then it is not hard to show that the unique solu- 
tion of the above minimization problem is 


. ; 
for t = |, 
in — G 
(5.3) l I . 
for i N, 
ln — & 
0 , otherwise; 
and the minimax loss is 44°/(4y — 4). In other words, a minimax estimate of € 


is obtained by taking the slope of the line passing through the ‘‘extreme’’ points 
(yi, 4) and (yw, tw). 

$. Consider the estimation problem of Example 3, but assume that the sum 
of the variances of y, --++ , yy is not greater than &. As in Example 1, this cor- 


responds to taking g = 1 in Theorem 1. By Theorem 1 the usual least squares 
estimate >> [(y: — g)(ti — OD) /(t; — 0 is. a minimax estimate of ¢, and the minimax 
loss is k/D* (t; — 0). 

Suppose further that ¢; = 7 — 1 (e.g., think of ¢, as successive times), and con- 


sider the transformation (taking successive differences) 


Yi» fors = I, 
(5.4) 
i = Meas for | ae | 
The means of the z, are 
d, for 2 Bs 
(5.5) Ez ; 
e, for 7 Be Bes 


New assume that the sum of the variances of the new variables z; is not greater 


than k; then by Theorem 1 a minimax estimate of ¢ is 


L wy 
V - po  N=1?’ 
and the minimax loss is k/(N — 1), a different result from that obtained before 
making the transformation (5.4). 
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COVARIANCES OF LEAST-SQUARES ESTIMATES WHEN RESIDUALS 
ARE CORRELATED’ 


By M. M. Srippiqur? 
University of North Carolina 


1. Summary. In this paper we will study the effects on the covariance matrix 
of the least-squares estimates of regression coefficients and on the estimate of the 
residual variance when the usual condition of independence of residuals is 
violated. The cases of linear trend and of regression on trigonometric functions 
will be considered in some detail. 


2. Introduction. Several authors have studied the problem of estimating re- 
gression coefficients when residuals are autocorrelated. We refer here only to the 
work of Grenander and Rosenblatt [2, 3, 4]. Grenander [2] gives conditions on the 
regression variables for the existence of consistent estimates of the regression 
coefficients. He also gives conditions on the residual process under which the 
least-squares (LL.8.) estimate of a regression coefficient is asymptotically efficient 
with respect to the Markov estimate. The covariances of the L.S. estimates as 
summarized in a matrix form are well known and are given at the end of section 3. 
The exact expression for an individual covariance or variance in the general case 
is easily extracted from this matrix and is given in section 4. The variance of 
the L.S. estimate in the general case is also given by Grenander [2, (8) p. 258). 
Asymptotic expressions for the covariances of these estimates are also available 
(2, 4). However, it seemed desirable to present here, in some detail, exact ex- 
pressions or high order approximations to them for the individual variances and 
covariances of the L.S. estimates of regression coefficients and for the expectation 
of the estimate of residual variance, particularly for the cases of general interest, 
in readily usable form, and derived in an elementary fashion. The first term of 
each of our expressions coincide with the asymptotic expression given in [2, 4], 
when the regression coefficients are made comparable. 

Bounds on the covariances of L.S. estimates are also provided in (7). 


3. The L.S. estimates. Let y = 2’8 + A be the regression equation, where 
y and Aare N X 1 column vectors, 8 is a p X 1 column vector, z is ap X N 
matrix and a prime is used to denote the transpose of a matrix or a vector. It 
is assumed that NV > p, x is non-stochastie and of rank p, and A is a N(O, o P) 
vector variate, where 0 is a zero veetor and P is a positive definite correlation 
matrix. 


. , ] , , e,° . 
Introducing c (2x), 2 cry, v = y — a’byn = N — p, and writing 6q¢ 
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for g — Eq, where q is a variate with expected v: alue Eq, it is known that b and 
s° = v'v/nare the least-squares estimates of 8 and o’ respectively. It is also known 
that Hb = B, and 


B = Eébeb’ = o'cxPx'c. 
In case P = Jy , where /y isthe N X N identity matrix, Es’ = o and B = ot 


4. The covariance matrix B. We propose to study the effects on B and Fs? 
when P is given by 


N-1 
(1) P=iy+ >), alC +”), 
kewl 
where 
© ig 6 i) 
001 0 0 
(2) CY 26 © skeen ka bedeaw aus 
0 0 0 O l 
00 0 0 0 


i.e. when 


(3) EAtAnk = opr, k=0,1,---,N loo= il 
We have 

(4) v= y—a'b v(8 — b) +A = Uy — w'exr)A 
as b = 8 + exA. Writing m = x’cr, we have m’ = m and m m. Hence if X 
is a characteristic root of m, \ = 0 or 1. Writing ‘‘tr” for the trace of a matrix 
we obtain tr m = p. Now, by simple evaluation 

2 « N=—1 N-k 

» 2 o , 2 - 

(5) Es’ = —(N — tr Pm| = ¢ E - 7 7 pe mM, ws]. 

n kel t= 


Here, if e is a matrix, e,; or e;,, refers to its element in the 7th row and the jth 
column. 
If we write d = ex, we find that 


N N 1 N—k 
Bi a Ebb, 5b = a | a. dj, + > 7 pe(dj, di ess “T- di draw) | 
k 1 


t=1 1 ¢ 


If, by a proper choice of x or with a suite able transformation on x2, we make 
re’ =e ' = 71, ,we haved = x. Writing x; for the row vector in the ith row of 
v, we find 

W—1 


(6) Bs — o'b5 = 0 Doe ti(C’ + C")z;, 4g = 1,-+* Dp; 


k=l 


where 6;; = O if t # j and 6;; = 1. 
It has been shown [1, p. 130] that if A isan N X N real symmetric matrix with 








to 
7. 
we 
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characteristic roots a; S a. S -::: 3 ay, and uw and v are N X 1 real vectors, 
then, under the —— uu=vv = 1, u'v = 0, -” bilinear form u’Ay has 
a Maximum (ay — a@)/2 and a minimum (a, — ay) . Also the quadratic form 

u'Au S ay. Now the maximum characteristic root of Cc’ +c , where k is a 
seitiien integer, 


k ) 
ay ? COs ‘ i = = 2 cos< V+ if? 
N cr — | \. ok — 1) 
| - |+ I 
k 
where [q| denotes the largest integer S g, and the minimum characteristic root 
a, = —ay,|{1l, p. 101]. Hence, we obtain 
N—1 hos v-1 
(7) Bj; —.0°5j; | < 20° : a px COS —— a < 20° a | px | 
ken] N + 9k — | ken 
In the case p, p,k 0, 1,--- , where a \p| < 1, we have 
2a0° aay ds ; ofl 
(8) By |< —— Wiei Bax a ( + *\ 
l—a l—ea 
5. Linear trend. Let V = 2r + 1 where r is « positive integer and consider 
the linear trend in the form 
(9) Ut 6,(2r + iy + eft — r — lyfa+ A, gm j++ HN, 
where 
(10) a’ = r(r + 1)(2r + 1)/3 = N(N’ — 1)/12. 
In the notation of section 3 
Lee eo + 1y™. ry, = (t— r — 1)/a, t=1,--:- ,N, 


(11) N 5 
hy V NG, b = b ty, — (r + 1) = »-| Ja 
t=1 t=1 / 


Furthermore 
c= I,, p z. n= WN — 2, 


3(2i — N —1)(2j;-—-N — 1) 


uy 7 (x’2x) i; N , — 
vs os + N(N? — 1) 


ns’ = a Ve _ bi — 8, 


(12) Bu = aft +2 x (1 - ty ‘|, By = 0, 
k=l N 


N 9 9) 3 
me e[i+ 2E a ON (s+ N? ) 2 hon 
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od : 2 2 > 
Es’ =oa}1-—- : a prt -L4+ —5 \y hpy 
NM ket nN N? — 1/7 (23 


j w-— 3 
~ aN(N? — 1) 2 , n| 


lor the case when p, = p', we can evaluate the summations >> p, , >> kp. , ete. 
. ye N . 
If N is moderately large we may neglect p- and thus find 


E s at ie 4p ‘ Sp 4(p + 4p + p) 
oe n(l — p) nN(1l — p)? nN‘(N? — 1)(1 — p)*’ 
By i oe 2p— Np + (N — 1)” 
o N (1 — p)? 
(13) ; 
, 2p — ) 1 - 2 
ee EF a! rp a ae 
l1—p l—p N(1 — p)’ 
Boy ] 6 
By = 0, ik See oa eee, 
o l1—p N(1 — p) 
We note that 6; are independently distributed N(8, , B;;), 7 1, 2, variates. 
If we set 
' , /12b. 
b, = N*h, = y; bo = v 12), 


/N(N? — 1)° 
the estimate of Ly, is given by 
(14) Y,=9+ b(t —r — 1) 


and under the first order autoregressive scheme for A’s, 


> 


s sao 2+ 4 . 120° (; + p\ al 
‘ionie——), «5 & ‘ov (g, by) = 
(15) o; = (5 =p mE TID p)? COV be) = 0. 


Thus 
2 = o | ~ 2 [ 4 IWwi- re 1)° 
3.2 5(3 a Py N?2— ] |. 


6. Regression un trigonometric functions. Consider 


lI? 


a 


= By V/N + /2/N 2, B21 COS Ait 
(16) : 
+ V2/N >> Boiss sin it + A:, t=1,---,N, 
i=] 


where A; = 27w;/N and w; is a positive integer less than N for 7 (4 


and w; ¥ w;if i #7. 


~ 
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In the notation of section 2 
M1, = 1 ‘JN, Voit = V2 N cos d, 1, 
Teint = W2/N sin Ad, i = 1,2, -+- ,q;t = 1,2,---,N, 


re =Cc = Iogy, n= N—2q-1, 


by 3 v Ni, bo = V2 N Zz Ye cos Ait ’ 
t 


17 ) ; : 
(17) boi = V2/N = y: sin dA; t,7 = 1,°+: ,q, 
m. = 1/N + 2/N > cos (t— s)Ay, tts =1,---,N, 
i=l 
» ° a3 ‘ 
s = (x yi 7 vi) /n 
t im} / 
We find 
e 3 N-1 9 Nz! acta kk 
(18) E~=1-=) p+ 4 bo 2D (1 -£) meosk 
oc WL keel nv kewl i so | N 
For the covariances of b; and 6; we obtain 
' N—1 k 
By ak +2 (1 — £) | 
Gas N 
V2 ok 1 N—k 
Bi oi = \ ze ~ px \cos (t + k)A; + cos A,}, 1=1,---,4q, 
Keo] tel 
j N—1 N—k 
(19) Boo; = 0° 1 +2 > Dd px cos (k + t) d; cos as], t= 1,-*-,q. 
é kel tel 
29° N-1 N—k 
Bo; 254. = yV 7 =. px }cos f\; sin (k + 2A; 
‘ keel (1 


+ cos (k + t)d; sin f;}, 4j =1,-°:,q. 


By xin, and Bo;4, 0:4; are obtainable from the expressions for B,».; and Ba; 9; 
respectively by replacing cosine by sine. 


If p. = p , and p’ is negligible, we find for the variances, after some reduction, 
By ~ l — 2 Si 2p 
ao 1l1—p N(1 — p)?’ 
Bsisi ~ l—p ‘ p Cos Aj 
20) a ~ | — 2pcosdA; + po N(1 — 2p cos \; + p*) 


_ pl + p) cos; — 2p 
N(1 — 2p cos d; + p*)?’ 
‘ a l—p ee p COs A; 
o ~ 1 — 2pcosaA; + p* N(1 — 2p cos A; + p”*) 
p(1 + p’) cos rd; — 2p 


_ . gam dows @ 
N(1 — 2p cos Xd; + p?)? q 
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Alsi ) 


(21) B . ~1- 2p _ dp > cos A; — p * o( ): 
o n(l — p) n i=1 | — 2p cosdA; + N? 

7. Concluding remarks. We conclude with the remarks that in most practical 
cases the correlation matrix for A’s will not be known. However, if A’s may be 
represented as a stationary autoregressive process of some small order—-in many 
cases first or second order scheme gives a reasonably good fit—-we would be re- 
quired to estimate a few parameters pi, p»,--- , p.. We, however, note that 
these quantities do not appear in b and s’, only in B and Es’. 

We further observe that the estimates, 8 and 6°, of B and o° obtained from 
maximizing the likelihood function will depend on the parameters of P, i.e. on 
Pi, px, °** , py-1, Which will mean using sample serial correlation coefficients 
to estimate p’s in the expression for 8 and 6°. These estimates will obviously be 
non-linear. Thus it seems more desirable to stick to the least-squares estimutes 
b and s° rather than to attempt to develop maximum-likelihood (or minimum 
x’) estimates. 


8. Acknowledgement. The writer wishes to express his indebtedness to 
Professor Harold Hotelling for drawing his attention to this problem. 
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ON A PROBABILITY PROBLEM IN THE THEORY OF COUNTERS 
By L. Taxics 


Research Institute for Mathematics, Hungarian Academy of Sciences, Budapest, 
Hungary 


1. Introduction. Let us suppose that particles arrive at a counter in the time 
interval (0, ©) according to a Poisson-process of density \. Each particle arriv- 
ing in the time interval (0, ~) independently of the others gives rise to an 
impulse with probability p or | according to whether at this instant there is an 
impulse present or there is no impulse present. The time durations of the im- 
pulses are identically distributed independent positive random variables with 
distribution function H(x2) and these random variables are independent of the 
instants of the arrivals and of the events of the realizations of the impulses. 
We define as “registered particles’? those particles which occur at an instant 
when there is no impulse present. Denote by », the number of the registered par- 
ticles in the time interval (0, ¢). The problem is to determine the distribution law 
of », and its asymptotic behaviour as t— «, 

The particular case of the above problem, when the time durations of the 
impulses are constant, was investigated earlier by G. E. Albert and L. Nelson 
1}. 


2. The structure of the process. Denote by {r,} the sequence of instants at 
which particles are registered. We say that the system at any instant ¢ is in state 
A when no impulse covers the instant ¢ and in state B otherwise. Then the sys- 
tem assumes the states A, B, A, B, --- alternatingly. Let us denote by &, 
m, &, mm, °*: the times spent in states A and B respectively. If the system at 
the instant ¢ isin state A, then ¢ is evidently a regeneration point of the process. 
Consequently {£,} and {,} are independent sequences of identically distributed 
positive random variables. Clearly P{t, < 2} = F(x) =1—e”™ if x= 0. 
Write P{n, S x} = U(x), where U(x) is still unknown. (We use P for the sym- 
bol of probability and E for expectation.) It can easily be seen that the instants 
of the transitions A — B coincide with the instants 7, (n = 1, 2, ---). Conse- 
quently the time differences r,,; — 7, (nm = 1, 2, ---) are identically distributed 
independent random variables with distribution function G(r) = F(x) * U(x) 
ie 


(1) G(r) = | Ula — ye dy, 


while P{r,; < x} = F(x). 


3. Notations. Let us introduce the following Laplace-Stieltjes transforms: 


(2) ys) - [ e ro dG (x) 
0 
teceived May 17, 1957; revised May 15, 1958. 
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and 
(3) w(s) = | e ” dU(z). 
0 
By (1) we have 
A 
(4 8) = (8). 
(4) ¥(s) er 
Further put 
(5) a= | xz dH (zx), sg = (x — a)” dH(z), 
! “0 “0 
(6) T= [ z dU(z), p = (a — r)" dU (2), 
/0 0 
(7) A= | x dG(x), B’ = | (x — A)’ dG(2). 
Jo Jo 


By (1) we clearly have that A = 7 + (1/A) and B p + (1/N). 
Denote by P(t) the probability that at the instant ¢ the system is in state A 
and put 


o 


(8) 1(s) = é P(t) dt. 
“0 


4. Theorems concerning v,. In what follows we 


shall give some general 
theorems for v;. 


1. We have 


(Y) <= 


lye Sj 1 — F(t) *G,(t), 


where G,(x) denotes the n-fold convolution of G(a) with itself 


. (Gol(x | if 
x = Oand G(x) = Oif « < 0). For 
Piv, Sn} = Plt < teas} = 1 — Pltag S tf, 
and tag. = Tr t+ (72 — 11) + +++ + (tanga — Ta) 18 A SUM of independent 
random variables. 


2. If A < «, then we have 


(10) lim Piyr., == Fe 


Tx 


IA 


= 1 — G*(t) «G,(t), 
where 


: f11—G(u)|du if t20 
Jo 


0 x t<@ 
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The proof is similar to that of (9), only we must use the result 

lim P{yr,, — vr = 1} = G*(d), 

T+« 


which was proved by J. L. Doob [2]. 


3. If BP < x, then we have 


t 
“%— 
° ) A 1 . u2/2 
(11) lim P< =szx)=— | e du. 
tox { Bt | V 2r x 
A 


This can be proved by the aid of the method of W. Feller [3]. (Cf. [5]). 


1 iB < «x, then we have 
t 
a 2 
P< lim sup - - = 1> 
to 2B 
| {3 t log log t 
(12) ° 
{ t 
— ; 
= P< lim inf — -_—- = —l>=1. 


2B 
: t log log t 


This can be proved by the aid of the law of the iterated logarithm stated by P. 
Hartman and A. Wintner [4]. 


5+. Applying the strong law of large numbers we obtain 


. es | 
(13) Pilim = —} =1, 
tone A 
(ef. J. L. Doob [2]). 
It is easy to see that E}»,} = M(t) ean be expressed as follows: 
(44) M(t) = 2) Pir. < th. 
n=l 


6. Ii A < x, then for any h > 0 we have 


(15) lim M(t 1S ; 
ton t 4 


by the theorem of J. L. Doob [2]. 
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7. if B < «, then we have 


(16) [ camp t+2t4 
0 As 2A? 
if s > 0. For by (9) and (14) we have 


a® 


(17) | ” awe dM(t) = . 
0 


(A+ s)[l — y(s) 


and 
y(s) = 1 — sA + ~ (B’ + A*) + o(s’) 
ifs — 0. 


8. For the Laplace-transform of P(4) we have 
a@® 
| 


(18) otis an | e P(t) dt = 
“0 A+ s)Il 7(S8)} 


and 


(19) P = lim P(i) 


x 


Proor. As M(t + At) = M(t) + P(t)\At + of At), we have M’'(t) AP 
and thus (18) follows from (17). Now 


20) Pi) «1 - fl ~ Ute — 2) M2), 
0 


for by the theorem of total probability we have 
i= Pip = 2 | [1 ae: a x)} dP ir, =< 2} {1 — U(t — x)| dM (x), 
1 1 “0 . 


which agrees with (20). By virtue of (15) we obtain from (20 


3 y 1 a® 7 ve 
lim P(t) =1-— {1 — U(x)) da | — , 
t-x A “0 { 
Since 7 A — (1/X), equation (19) follows. 


Remark. Taking into consideration that W'(¢) AP(t), we obtain from (20) 
the following integral equation for P(t): 


(21) P(t) =1—2 a UG Pe) & 
“0 
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l'rom (18) or from (21) we obtain that 


99 a ies : e* aU - A+ s tiny 1 = 
(22) w(s) I IU (x) x 1 Ad ase! 


To apply the above theorems it remains only to determine G(r), A, and B’. 
5. The determination of G(.), A, and B*. 
‘Theorem. Jf 0 < p S 1, then we have 


Ap+s 
p(r + s) 


=“ t \ 
_ ‘ / exp | —a - rp | (1 — H(z)) ac | dt> : 
P(A + 8) | Jo 0 


y(s) = [ e * dG(x) 
0 


; eo” + >= 1 
Ap 


and vs < @, then 


eri , 
B = : 4 exp gE | (1 — H(x)) ax | — ldt 
Ap Jo | / 


/I p 0 then U(x) Hix) and cOnseque ntly 


26 G(x | H(x — ye d dy, 
27 ioe, 
\ 
and 
(98 B | T og 
iS 


Proor. Let us consider « new process which is a particular case of the process 
defined in the Introduction. Suppose that the density of the underlying Poisson- 
process is \* and each particle gives rise to an impulse (with probability p* = 1). 
Let H*(x) = H(2x) be the distribution function of the duration of the impulses. 
This is the case of Type II counter. Denote by {£%} and {2} the sequences of 
the times spent in state A and B respectively. Clearly P{g% < x} = 1 — e** 
if « = 0. Write P{n% < x} = U*(x). Denote by P*(t) the probability that at 
the instant ¢ there is no impulse present. We have showed in [5] that 


(29) P*(t) = exp | —\* [ [1 — H(2)] az. 


“0 
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Applying (22) it follows that 

; ; iy A*+ 8 l 
, #(c) = 82 TT #() ih wade 
(30) w*(s) i 6 dU*(x) ~ SUAe 
where 


e “P*(t) dt = | exp< —st — A* | {1 — H(x)] dx» dt. 
“0 ) ) 


(31) r*(s) = | 


“0 
Now we observe that, if in this new process \* Ap, then we have 
(32) U*(2) U(x), 


where U(x) is related to the general process. The equality (52) can easily be 

seen if we take into consideration that the arrivals of those particles in the gen- 

eral process, which arrive during a dead time and which give rise to an impulse, 

form a Poisson process with density Ap. Accordingly by (30) and (31) we have 
- 


w(s) a ‘ sx dU (x) - \p + 8 
“0 \p 
(33) 


=j 


— {Ap I exp| - c— , (1 — H(z)) as | dt> , 


and by (4) we obtain (23), which was to be proved. 
If we introduce for the new process the analogous quantities W*(t), A* and 
B* corresponding to (7) and (14), then by (16) we obtain that 


7 = 


, “ oon 1 BY + A* L 

(34) | e' dM*(t) = as | é" P(t) dt = a + sa — xaqe + 00) 
if s—> 0. Since P* = lim... P*(t) = &“*, we obtain from (34) that 

(35) A* = ¢** /)* 


and, further, 


7 


(36) BY = 2*A* | [P*(t) — P*] dt — A®* + 2A*/a*. 
“0 


If in particular \* = Xp, then clearly 


1 ] 
{—-—-= A* —-. 
nN » 
and 
2. 1 . 1 
BY — — = B® — 
te y*? 


and thus (24) and (26) are proved. The case p = 0 is evident. 
Finally we remark that the more general case when the arrivals of the par- 
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ticles to the counter form a recurrent process was dealt by the author [6], [7], 
[8], but explicit solution was given only for a particular distribution H(r). 
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DISTRIBUTION OF LINEAR CONTRASTS OF ORDER STATISTICS! 
By Jacques St-PIERRE 
University of Montreal 
Introduction. Many theoretical and practical problems of statistical nature 
have lead investigators to study methods capable of pooling the information con- 
tained in the ordered (or ranked) sample values with some properties of the 
assumed distribution of the parent population. Since, in analysis of variance 
situations, contrasts between functions of observations are of utmost importance, 


linear contrasts of order statistics will be considered here under the assumption 
that the underlying distribution is normal. 


Null distribution of linear contrasts of order statistics. Let xo, %1,°-+ , 2n 
denote n + 1 independent normal random variables with unknown means 
i, M2, *** 5 Mn Fespectively, and with a common variance o = 1 (say). Let xe) > 


ra) > +++ > xm be the ordered values. Consider the following linear contrast 
Lo) ~ GLa — C2 X: = 6 * Se wey » ~ Cy is 
i=! 
eS 4 5S i, 2  *e* oe. 
Using, as a starting point, the joint density of xo), ta), +++ , tm as given by 


Wilks [7], and with the help of appropriate transformations, the null distribution 
of z can be obtained. It takes the form of a rather messy expression containing 
a n-fold iterated integral. An interesting particular case: the density of the 
difference between the two largest ordered values can be obtained from the 
general form. St-Pierre and Zinger [6] have tabulated the null density of « 


rw) — Lq) using a slightly different method. 
It is of interest to consider the above contrast in the case of three random 
variables. The density of 2 ro — eta) — (1 — e)re , under the hypothesis 


Hy: wo = mw. = wo = O (say), takes the form 


gz) = 3[ale —ert+ 1)] eg exp [—2° Ae —c-+t 1)} 
(1) »(c+1) 2/ (1—e) (6(c2—c4+1) J 1/2 ; . 
? (27) exp (—t'/2) dt. 
“ (2e—1) 2/([6(e2—e+1) J!/2 

With the help of [3], [4], and [5], g(z) can be tabulated. Values of g(z) are given 
in Table I for several values of the parameter c. 

From the general form (1), several densities can be derived as particular cases. 
lor instance, the value c = 0 leads to the density of the range as given by McKay 

Received March 28, 1955; revised July 7, 1957. 
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TABLE | 
Values of q(z), for various values of the constant c, where 


2= 2q — Cia — (1 — c)zq) 


oe 0 01 0.2 04 0.6 Os 0.9 1.0 


0.0 | 0.00000 0.00000 | 0.00000 0.00000 0.00000 =—-0.00000 = 0.00000 | -0. 84628 


0.2 10917 .12101 . 13709 . 17898 . 26659 .49282 . 73839 . 78334 
0.4 . 21095 . 23318 . 26050 .dd877 .47562 . 70969 . 75554 . 70763 
0.6 | .29932 32877 . 36410 45941 . 59859 .71048 .67281 .62378 
0.8 . 36927 40194 .43988 . 53834 .64049 .63299 .58344 .53652 
1.0 .41774 44958 .48459 . 55897 60386 .54116 .49344 . 45022 
1.2 .44376 .47102 -49861 . 54388 .53555 .45020 40687 . 36855 
1.4 .44833 .46822 .4AR548 .49838 .45187 . 36497 .32709 . 29429 
1.6 .43408 .44502 . 45086 r 43473 . 36725 . 28832 . 25636 | .22920 
1.8 | .40476 .40647 .40149 . 36270 . 28937 . 22196 . 19588 .17410 
2.0| .36474 . 35800 . 34410 . 29160 . 22168 . 16649 . 14590 . 12896 
2.3 . 31842 . 30485 . 28468 . 22659 .16529 | .12170 . 10593 .09315 
2.4 . 26981 .25145 23115 . 17064 . 12000 .O8668 .07497 .06560 
2.6 . 22221 .20121 . 17665 . 12479 08484 .06016 .05171 .04504 
2.8 . 17809 . 15639 13290 .O8871 .05840 04068 .03477 03016 
3.0 . 13903 .11819 09715 .06135 03918 .02679 .02278 .01968 
3.2 . 10580 . 08692 .06904 04130 .02556 .01720 .01455 .01252 
3.4 07853 .06225 04775 .02707 .01625 | .01076 .00905 00777 
3.6 .05690 04345 .03215 .01727 01006 00656 .00548 .00469 
3.8 .04026 .02975 .02109 .01073 00606 .00389 .00327 00277 
4.0 .02782 .01971 .01348 .00650 00356 .00225 00188 .00159 
and Pearson [2]; while the value ¢ = 0.5 leads to the density of v = x) — (2x) + 


tay + we))/3 as given by McKay [1]. The complexity of the expression for g(z) 
increases rapidly with the number of variables; consequently, we will limit our 
presentation to the above mentioned case. 


Non-null distribution of linear contrasts of order statistics. Here again, and tor 
the same reasons, only the case of three variables will be presented. In order to 
get the non-null distribution of 2 = zr@, — era) — (1 — c)x@ the joint density 
of x), %a) and 2) must be used as a starting point. It is of the form 


1 —y' —X’'X r 
g(X@ ,Ta ,X~) = ose | get e| ; em exp (u; X), 


where 


Ma Xo My 
= mM, X = |}, w= wi, 
Me 2 Mi. 


and }>* stands for the summation over all the permutations ip , i: , ¢2 of the num- 


bers 0, 1 and 2. Introducing the contrast z with the appropriate transformation 
and integregating out the extra variables, one gets, after a few simplifications, 
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the foilowing expression for the non-null density of 2: 


l 1 ' 
f(z) = exp | — — (wu — M°/3) 
VJ/2e JHE —c+D I | 5 HH | 
( —(22 — 2nz ( 22)" 
(2) ‘ex | . m2) lexp| 5 = 2h | 
. 4(? —c+ 1) 12(c? —c + 1) 
; 2 [(e+1 Ie) (yy +27¥2))/ (Cle) (6(e2?—e+1)) #2) 
[ (27) "exp (-f 2) dt 
J [(2e—1) 2—(7 14-279) /16(e2 c+}! /2 
where y; = wi, — Cui, — (1 — C)ui, , ¥2 = —(1 — ©)pi, + wi, — Cui, , and M 


wo + wi + we. It is easy to see, looking at (2), how much more complicated an 
expression for f(z) can become in the case of several variables. 

Many particular cases of interest have been considered, using expression (2 
as a starting point. Only two cases are reported here. The first one corresponds to 
the hypothesis Hy): 40 = 6, uw. = uw. = 0,6 > 0. Denoting by f(z | H,) the density 
of z under the hypothesis /7; , one gets 


“/, | ] . 
f(z| hy) = — exp |(—8/3)(g: + ge + gs)|, 
Vr(er=— ¢ + 1) 


where gi , gz and g; are functions of 2 and of the parameters 6 and ¢ given by 
gi(z; 6, ¢c) = exp [— (32° — Gie — (2c — 1)8)/12A(e¢ — c+ 1)|/,(z; 6, ¢), 
gz; 6, ¢) = exp [—(32° + edz — (2 — c)°6)/12(6 — & + 1)|lo(z; 4, €), 
g3(z; 6, c) = exp [—(32 + 6(1 — c)éz — (1 + €)°6)/12(e — ¢ + I) /5(z36, 0). 


The functions J; , 72, and J; are given by 


[(e+1)z—(1—e) (2e—1)6 c+1)s—(1—e)(2—c)4 


=elie=eth? exp (—f/2) i —eiet—e+)' 2 exp (—0°/2) 
eee Ges hk i= 7 Pe 
2e—1)8s—(Ze—1)8 (2a)! 2c—1 -(2—c)é (2q)! 
6§(e2—e+1))}1/2 H(et—c+1 3 
e+1)2+/(1+¢e)(1—-0e)6 ms 
I=c)[6(e2@—e FD)? exp (—t/2) 
I, = [ xT dt. 
2ce—1)3+(1+¢)5 (Qy)!/2 
6ic?—c+1 : 
Table I] contains the values of f(z | 1), in the case 6 1, for several values of 


the parameter c. 

The case of equal spacing of the true means, i.e., the one corresponding to the 
hypothesis Ho: uo = 26, w. = 6, uw = 0, yields a slightly more complicated ex- 
pression for f(z | H»). Table III contains some values of f(z | 2), in the particular 
case 6 = 1, for a few values of the parameter c. 








Values of the density of z = xo) — 


Hf, wo = 


* 
0.0 0.00000 
0.2 O7843 
O.4 15340 
0.6 22169 
OLS 28049 
1.0 32704 
ioe 36168 
1.4 38200 
1.6 SARAD 
ILS 38314 
2.0 SHO5S 
a0 34126 
2.4 30054 
2.6 27387 
2.8 23387 
+0 19953 
, 2 16449 
4.4 13256 

Values of the 
0.0 0.00000 
0.2 O4056 
0.4 OS 100 
0.6 12140 
OLS 16106 
1.0 W034 
Law 2351S 
1.4 26731 
1.6 20437 
1.8 s1476 
2.0 32837 
2.2 33363 
2.4 33070 
2.6 31006 
2.8 S0177 
3.0 2702 
o.2 25165 
3.4 22185 


DISTRIBUTION OF 


oO; 


0. 00000 
OS707 
16084 
24434 
30717 
35584 
SSS6S 
40550 
JOGO 
BOF 
sSTOTo 
33851 
30284 
26024 
21039 
1S053 
14500 


de nsily of 2 


QOO00 
O3960 
QOO37 
S41 
17860 
2205S 
25927 
20305 
32027 
33850 
34951 
35004 
34123 
y2308 
20072 
27019 


0.00000 
O9O7S3 
19015 
27187 
33879 
38797 
11801 
12004 
$2265 
10156 
36936 
S2048 
28546 
24117 
1OS34 


15882 


QOO00 
O5060 
LO133 
15152 
20030 
24647 
2880S 
3229S 
34906 
36482 
S0040 
36285 
34610 
32083 
28927 


25383 
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TABLE II 
ctay — (1 — ©)xy) under the hypothesis 
6=llm= m= 0 
04 06 O8 09 1.0 
0.00000 0.00000 -0.00000 0.00000 0.69550 
. 12988 19255 36249 58377 .65223 
. 24916 35644 . 56737 . 63842 . 60265 
BASAT 47043 60539 . 58628 54862 
.42129 52678 . 56566 . 52858 .49195 
46387 53322 50692 46915 45436 
47716 51027 $4557 40985 37744 
{6580 45370 38510 35211 32250 
43486 39556 32714 29733 27098 
39238 33636 27293 24660 22357 
34034 27009 22364 20067 18002 
28770 22830 17958 16014 14373 
23706 . 18255 14116 12632 11184 
19065 1428S 
14980 
Tasue Il 
tay — CL = ©) ay, under the hypothesis He:po = 
6, uw = 0; 6 = 
0.00000 0.00000 0.00000 0.00000 54317 
06759 LOST 20168 38016 52172 
13497 20146 37844 51800 .49788 
20107 29509 47802 . 50288 $7151 
26360 37311 49628 47401 $4252 
31926 12638 $7505 +4270 .41092 
36377 45112 .44210 40851 37691 
39430 4491S .40526 37182 . 34096 
10863 $2718 36591 .3333 1 . 30372 
10664 39249 32495 .29381 26591 
3S986 35119 . 28344 . 25435 . 22887 
36159 30690 23920 . 21601 19316 
32502 26260 19934 17977 15978 
28390 21798 
24141 
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ADMISSIBLE ONE-SIDED TESTS FOR THE MEAN OF A 
RECTANGULAR DISTRIBUTION’ 


By J. W. Pratr 
Harvard University 


1: Theorem. Suppose we have a sample of n > 1 independent observations from 
a uniform distribution with unknown mean 6 and known range R. Suppose we wish 
to test Ho: 0 S 0 against H,:@ > 0. Then an essentially complete class of admis- 
sible tests is the class YA of all tests of the following type. Let u be the minimum ob 
servation, v the maximum. Let g(u) be a noninecreasing function of u such that 
g(u) = 0+ 3R for u < 6 — 4R. Accept Ho if and only tf v < glu). 


2. Discussion. The two-sided problem has been treated by Allan Birnbaum 
[1]. He showed that, for testing Ho:0 = 6 against H{:6 # 4%, an essentially 
complete class of admissible tests is the class of all tests of the following type. 
Let v(w) be a nondecreasing function of u. Accept Ho if and only if »v > v(u) and 
%—tR<u<v < O + SR. 

Birnbaum [1] also noted that there is a uniformly most powerful size a test 
of Hj:0 = 6 against H,:6 > 6, namely that accepting Hy if % —434R <u 
% + 4 — a ")R and v < 6 + 43K. This corresponds in our notation to 


A + sR for “ < Ay + (i : a ae. 


qui) 
6) — 4R (say) otherwise. 
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In this rather simple situation, then, an essentially complete class of admissible 
tests of the simple hypothesis against one-sided alternatives consists of the 
uniformly most powerful test (just described) for each significance level, but the 
class of admissible tests of the composite hypothesis against one-sided alternatives 
is very general. The class of admissible tests of the simple hypothesis against two- 
sided alternatives is also very general, but quite different. It includes unions of 
idmissible lower and upper one-sided rejection regions (if and) only if they are 
«dmissible for the simple hypothesis, and such unions form a portion ‘‘of measure 
zero”’ in the whole class. 

In the following section we will prove the result stated in the first paragraph. 
The proof uses no general results of decision theory, such as the complete class 
theorem, but only direct methods of an essentially elementary constructive type. 
It obviously works in some slightly more general situations, which are given ex- 
plicitly in [2}. 


3. Proof. Without loss of generality we may take 6) = 0, R = 2. Since (u, v) 
is a sufficient statistic, an essentially complete class of tests is the class of all 
randomized tests based on (u, v). Suppose such a test is given, accepting H, with 
probability @o(u, v) when (u, v) is observed. 

The triangle T7(6) = {(u,v):@ —-1<usv < 6+ 1} contains (u, v) with 
probability one if @ is the true mean. The probability of accepting Ho using the 
test function ¢ is 


(1) Ey(@) = II o(u,v)2- "n(n — 1)(v — uu)" du de. 
Te 


If 6 = 0, then u > —1 with probability one. If 6 S 0, then v < 1 with prob- 
ability one. Thus if (uw, v) is not in T(0), we know which hypothesis is correct. 
Accordingly, let 


go(u,v) if (u,v) e TO), 


(2) oi(u, v) | £ s -l, 
GC # ve i. 


V IIA 


Then ¢; dominates ¢o , i.e. ¢ is at least as good as @¢» for any 8@, i.e. 


(3) 14d ) S E4(d0) for e= @. 


IAI V 
VIIA 


Define f(v) for -—1 <v < 1 by 


Sr . 
(4) (y — u)"” du | o(u, v)(v — u)"* du, —1 < f(v) S »v. 
* 5% 


lifw < f(v), -l1 <v < llorifev 


IIA 
| 


(>) g2(u,v) = ; 
Oifu > fv), -—1 <» < l,orify 2 1. 
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Then, with respect to the density 2°"n(m — 1)(v — u)" “duuvw — 1 <u <2, 
¢2 has the same mass as ¢; on each horizontal line in the (uv, v)-plane, but concen- 


trates it as far to the left as possible. Furthermore, ¢» ¢; except on 7(0). 


Therefore 
; : < 
(6) elds) = Ee(di) for 0 > o. 
Therefore ¢2 dominates ¢; . 
Define g(u) for —1 <u < I by 
(7) | (y — uy” dh | gdo(u,v)tv — u) * dv, u < oie) § I, 


if the right-hand side is positive. If the right-hand side vanishes, or it 1 


> l, 
let g(u) = —1. If u < —1, let g(u) 1. Let 
l il va glu 
(S$) b3(u,v 
Q if 2 g(a). 
Then, with respect to the density 2>"n(n — 1)(v uy" dh 


ne ed, 
$3 has the same mass as @2 On each vertical line in the (1, v)-plane, but concen 
trates it as low as possible. Furthermore, ¢; 2 except on 7T(0). Therefore 


(9) Helos) = Ep(ge) for 06S 0. 


Therefore ¢; dominates @» . 


By (5), de(u, v) is noninereasing in u for each 2 


u<g(u), —1 < g(u) S 1, and g(u) is nonincreasing in wv. This is the essential 


part of the requirement that ¢; be in %, and g(w) was defined for other values of 
uso that ¢; actually is in YL. 


. Therefore, by (7), tor -1 < 
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We have thus shown that any test is dominated by a test in Y, i.e. that A is 
essentially complete. It remains to prove admissibility. Suppose @ and ¢* are 
given by g and g*. Without changing the characteristics of the tests, we may re- 
define g and g* so that they are left-continuous and so that g(u) = —1 where 
g(u) < u, and g*(u) = —1 where g*(u) S u. Suppose there is a wu’ such that 
g(u’) > g*(u’). Choose vu” such that g(u’) > uw” > g*(u’). (See the diagram.) 
Let “area” be measured with respect to the density 2°"n(n — 1)(v — u)” “du do. 
By left-continuity, g*(u) < wu for all u in an interval whose right endpoint is wu’. 
Therefore either the “area” below g in T(u’ + 1) is less than that below g*, or 
the “area” below g in T(u” — 1) is greater than that below g*. But the “‘area”’ 
below g in T(@) is just F6(¢). Thus either 2y4:1(6) < Eu-4i(*) or Eyeilo) > 
E(o*). But uw’ + 1 > O and u” — 1 < 0, so this shows ¢ doesn’t dominate 
¢*. Hence if @ dominates ¢*, g(u’) S g*(u’) for all uw’. But in this case either 
¢ and ¢* are essentially the same or £4(@) < E¢(¢*) for sufficiently small positive 
§. Therefore @ cannot dominate ¢*. Since @ and ¢* were arbitrary tests of the 
essentially complete class 4%, it follows that all tests in A are admissible. 
This proof of admissibility is spelled out analytically in [2]. The proof of es- 
sential completeness given there uses a general property possessed by the rec- 
tangular distribution. 
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A METHOD FOR SELECTING THE SIZE OF THE INITIAL SAMPLE 
IN STEIN’S TWO SAMPLE PROCEDURE 


By Jack MosHMAN 
Corporation for Economic and Industrial Research, Arlington 2, Virginia 


1. Summary and Introduction. The use of an upper percentage point of the 
distribution of total sample size in conjunction with the expectation of the latter 
is proposed as a guide to the selection of the size of the initial sample when 
using some version of Stein’s [5] two-sample procedure. It is a rapidly calculable 
function of the underlying population variance based on existing tables of the 
x’ distribution. A rule-of-thumb is proposed to be used in making the actual 
selection of initial sample size. It is a simple matter to investigate the nature 
of the percentage point for different values of the variance over a limited range; 
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a recommended conservative choice when the variance is not known is the se- 
lection of a large initial sample. 

Dantzig [2] proved the nonexistence of nontrivial tests of Student’s hypothesis 
whose power was independent of the variance, a result extended by Stein to the 
general linear hypothesis. In the same paper Stein proposed a two-sample pro- 
cedure whose power was independent of variance. The same two-sample method 
could be used to obtain a confidence interval for the mean of a normal distribu- 
tion with predetermined length and confidence coefficient. 

Stein gave no specifications for the choice of the initial sample size, but 
Seelbinder [4] suggested that it be selected to minimize the expectation of the 
total sample. In a recent paper, Bechhofer, Dunnett and Sobel [1] used Stein’s 
procedure for another application, noting that the variance of the total sample 
size increased as the size of the first sample decreased. 

An efficient choice of the size of the initial sample will hold the expectation 
of the sample small, and will further reduce the probability of an extremely 
large total sample. This note will explore the matter in further detail and show 
that an upper percentage point of the distribution of total sample size, when 
used in conjunction with the expectation, is a rapidly calculable guide to an 
efficient choice of the size of the first sample. 


2. Basic theory. As developed by Stein, the two-sample procedure involves a 
preliminary, arbitrary choice of a positive integer No and a number z > 0. The 
value of z will depend, when constructing a confidence interval of length 2 for 
the mean, on the precision of the estimate, i.e., the length of the interval, and 
its reliability, the confidence coefficient. Specifically, if t,,, is the upper 100 y 
percentage point of Student’s distribution with n degrees of freedom, one would 


take 2 = L’/ty,-1-:a/2) te obtain a confidence coefficient 21 — a. 
a . . 2 ; w\2 A? . 
A sample of No observations is taken and s* = }>(a; — #)’/(No — 1) is com- 
puted as an estimate of the unknown variance o with n No — 1 degrees of 


freedom. The total sample size, NV, is then 


(1) N = max (=| + |, No) : 


where |t} is the largest integer less than t. 
[lence it follows that 
- : : : : s : . ns” r — nNoz 
(2) Prob (NV = No) = Prob Ss No) = Prob | — = x(n) Ss — ‘ 
z / o~ o~ 
where x'(n) is distributed as x° with n degrees of freedom. Furthermore, for 


integral m > No, 


S 


Prob (NV = m) = Prob (mm -1l< . m) 


IIA 


i (2 : lz < x(n) < wt) 
o 


2 
o 
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Therefore, letting \ = z/o, one may easily show 


E(N) = No Prob(x*(n) < ndNo) + : Prob (x°*(n + 2) > ndNo) 


(4) 
+ @, Prob (x°(n) > nrANv) 

and 
Var (N) = No Prob (x(n) < ndANo) + (n - ) Prob (x?(n + 4) > ndNo) 

ny” 

(5) 
2A. ° : ° ° 
+ . Prob (x(n + 2) > ndANo) + 63 Prob (x(n) > ndXANo) — (E(N), 

where 0 s 6; S 1,3 = 1, 2, 3. 


Whereas (4) defines E(N) within a maximum error of unity, (5) is not as use- 
ful inasmuch as the factor 1/A may be, and frequently is, large. 

Furthermore, it is somewhat difficult to translate Var (N) into working per- 
centage points of the distribution of N. A more useful procedure is to calculate 
a given percentage point N, of the distribution. This may be accomplished 
directly from (2) and (3). Define N, as the smallest integer 2 No such that 


(6) Prob(N s N,) = > Prob (VN = m) 


ma==N o 


IV 


Dp. 


But this is equivalent, if one writes p,(x°) as the probability density function 
of x(n ), to setting 


»mNydr 


(7) | Pn(x’) dx” =p 


“0 


and letting V, be chosen to satisfy (7), but not less than No. Thus 


(8) Ny = max 4No, E (00pth percentage point of x )] +1}. 
\ 4 nm 


which is tabulated in Hald [3] for example. Note that the upper percentage 
points of x°(n)/n decreases monotonically as n increases. Conceivably, if No is 
chosen very large, one can be reasonably confident that no further sampling will 
be necessary, but this is not an efficient procedure. 

A rough, but objective, rule-of-thumb may be derived by the following con- 
sideration: Let E(N | No) be the expectation of N if No = No and NXNo) 
the 100pth percentile of N if No = N¢. Define 


nX\E(N|N*) 
(9) P(N*) = I palx?) dx’, 


as the proportion of time N will not exceed E(N | Nj). Let No be the value of 
No which minimizes E(N), i.e., 


(10) E(N | No) < E(N No) 
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for all No. Now one might investigate alternative values of No by considering 


(No) = (1 — p)(N;(No) — Np(No)) 
— (1 — P(Ny))(E(N | No) — E(N | No) 


(11) 


and selecting No as the integer for which Y(No) is a maximum. In effect, (11) 
weights the expected changes in E(N) and N, by the probability of exceeding 
those values. It would be expected that p would be chosen independently from 
nonstatistical considerations. 


3. Example. If one takes \ = .1, where in the ordinary application considered 
by Stein n = No — 1, then E(N) is «a minimum for No = No = 3. Values of 
E(N | No) are tabulated in Table 1. It may be seen that E(N | No) is fairly 
constant over a considerable range. The same table also contains N.95(No) which 
decreases sharply where E(N | No) is relatively constant. 

It may readily be verified from (9) that P(No) = P(3) & .64. Rapidiy one 
may evaluate ¥(No) from (11), taking p = .95, and find that ¥(6) = .2686 is 
the maximum. Hence the rule suggested specifies No = 6 as the proper choice. 


4. Discussion. When the variance is unknown, two alternatives exist. It may 
be feasible to express the length of the confidence interval desired as a propor- 
tion of o; no difficulty then ensues since J is specified. If L is specified absolutely, 
in most practical cases a range for o is known. One can then investigate the 
distribution of N for various values of o in this range and make a subsequent 
choice of No. 

The procedures suggested in this note are particularly applicable to those 
situations where repeated sampling is not contemplated and/or there exists a 
physical reason for wanting to avoid excessively large samples. The latter situa- 
tion may obtain where larger individual samples may entail the purchase of 
additional test equipment or require the supplementing of a regular interviewing 
staff by extra employees. 


TABLE 1 
Dependence of E(N) and N 95 on No 


A= .1 

N E(N | No V w( No No E(N | No N (No 
2 10.45 38.41 10 11.84 18.80 
5 10.29 29 .96 12 12.92 17.89 
} 10.45 26 .05 14 14.35 17.20 
5 10.51 23.72 16 16.15 16.66 
6 10.63 22.14 1S 18.02 18.00 
7 10.80 20.99 20 20 02 20.00 
S 11.18 20.10 22 22.01 22.00 
9 11.43 19.38 24 24.00 24.00 
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ON A PROBLEM IN MEASURE-SPACES 


By V. S. VARADARAJAN 


Indian Statistical Institute, Calcutta 


Summary. Let $ be the family of all random variables on a probability space 
Q taking values from a separable and complete metric space X. In this paper we 
prove that F is in a certain sense a closed family. More precisely, if {&,} is a 
sequence of X-valued random variables such that their probability distributions 
converge weakly to a probability distribution P on X, then there exists an X- 
valued random variable on Q with distribution P. An example is also given which 
shows that the assumption of completeness of X cannot in general be dropped. 


1. Preliminary remarks. In what follows (Q, $8, #) is a probability space and X a 
separable metric space. We denote by @ the class of Borel subsets of X defined 
as the minimal o-field containing all open subsets of X. 

A map ¢ of Q into X is called a random variable if it is measurable i.e., ¢ : 
(A) ¢ S for each A e€ @. If ¢ is a random variable we define as its distribution 
the measure yw, on ® given by 

ug(A) = ple '(A)} 
for all A ¢ ®. A given probability measure P on @ is said to be induced from Q 
if there exists a random variable ¢ such that P = u,. 

Suppose we are given a sequence {P,} of probability measures on ®. We say 
that {P?,,} converges weakly to a probability measure P on @ (P,, = P in symbols) 
if 


lim { gaP, = [| g aP 
| I 


neo: X 
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for every bounded continuous function g on X. In terms of subsets of XY this is 
equivalent to 
lim sup P,(C)s P(C) 
n *0 
for every closed set C C NX ({1]). When X is the real line with the usual topology, 
this convergence is equivalent to the usual convergence of distributions. 


2. The main theorem. In this section we state and prove the main theorem. 
Before doing it we prove a lemma. 

Lemma. Let X be a separable and complete metric space and (Q, 8, w) a nonatomic 
probability space ((2] p. 168). Then any probability measure on ® can be induced 
from Q. 

Proor. Since X is a separable metric space, it can be imbedded homeomorphi- 
cally into a countable product of unit intervals by a celebrated theorem of 
Urysohn ({3] p. 125). Since it is also complete, the image of X will be a G; by a 
theorem of Larentieff ({3] p. 207). X can thus be regarded as «a Borel subset of 
2 countable product of unit intervals. This implies however that X can be re- 
garded as a Borel subset of the unit interval since the unit interval and the count- 
able product of such intervals can be connected by an one-one map which is 
measurable both ways. It is thus sufficient to show that any probability measure 
on the unit interval can be induced from @. This however is a well-known result. 

We now prove the main theorem. 

THrorem. Let X be a separable and complete metric space and (Q, 8, 4») an 
arbitrary probability space. If |£,} is a sequence of X-valued random variables 
such that u:, => P asn— & where P is a probability measure on @, there exists 
an X-valued random variable ~ such that P = yp: . 

Proor. Any measure space can be decomposed into its atomic and nonatomic 
components and in view of the previous lemma we can assume that there is no 
nonatomie component in 2. We can thus write Q = A, U Az U --- where (i) 
A;N A; = ¢fori # j, (ii) each A; is an atom of (Q, S$, w), and (iii) u(A,y) = ¢; > 0 
for each 7. The distribution P,,(=u;,) is then atomic and (since X is separable) 


has mass concentrated in a countable set of points, say {dni ,@n2, °°* }. Palani] = 
C;foré = 952. css, 
We first assert that for each 7, the set D; = fay, ax,--- | has compact 


closure. If not, then for some zp , D;, has a subset which has no limit point and 
which is infinite. We can assume without losing generality that this subset is 
D,, itself and that all the a,;, are distinet. If then D C D,, is any subset, then 
D is closed and from P,, = P it follows that P(D) = lim sup,../,(D). If D is 
infinite then, lim sup,../’,(D) 2 c;,. Thus for any infinite subset D C Dj, , 
P(D) = e¢;, > 0 which is a contradiction. 

Thus each D; has compact closure. We can then, by the diagonal procedure 
choose a sequence {nx} of integers and points a, a2, --- of X such that 

lim G,,.¢ = @; 


kow 
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fori = 1, 2,---. Let — be the random variable with values a; , a2, --- on the 
sets A; , Az, --- . We complete the proof by showlng that P = y; . It is enough 
to show that P,,, => uw; . In fact for any bounded continuous g on X, 


[ oar. = Devgan) > Deiglad = | gdue, 
x i ’ “x 


the passage to the limit being justified as >-,cig(a,,,;) converges uniformly in /. 
This completes the proof of the theorem. 

Remarks. (1) Suppose X is any separable metric space and X* its completion. 
The above theorem will still be true not for XY but for X* and & will now be X*- 
valued. If then X has the property that as a subset of X* it is measurable with 
respect to the completion of every measure on X*, ~ can be reduced to an X-valued 
random variable and the main theorem is true for such X. This is the case for 
instance when _X is itself a Borel set in X*. It is interesting to note that there are 
separable metric spaces X which have the above mentioned property in relation 
to X* but which are not complete under any metrization, for example, the set 
of rationals with the relative real line topology. 

(2) It is to be noted that when (Q, $, uw) is purely atomic the theorem is true 
with any separable X. 

(3) Suppose now A,, Az, +--+ is a sequence of sets in § such that u(A,) — a. 
Setting & = x, , the characteristic function of A, , we find that 4; => P where 
P is the measure with masses a and 1 — @ at the points 1 and 0. The above 
theorem then ensures the existence of A ¢ $ such that u4(A) = a; in other words 
that the range of » is a closed subset of [0, 1). 


3. An example. We construct an example to show that the theorem proved in 
Section 2 requires some such condition on X. We take for X a subset of [0, 1] 
such that (i) u*(Y) = 1, we(X) = O where yp is Lebesgue measure and (ii) X con- 
tains all points of the form m/2”. For (Q, 8, w) we take the unit interval with 
Lebesgue measure. The Borel sets of X are precisely the intersections with X 
of Borel subsets of [0, 1]. Lebesgue outer measure on @ is now actually a measure 
over it, denoted by X. 

Suppose now P,, is the measure on ® with equal masses 1/2” at the points 
m/2" (m = 1, 2, --+ 2"). It is easy to verify that P, = . Further each P, is 
trivially induced from Q. We will now show that A cannot be induced from ©. 

Suppose \ is induced by the map £&. é is obviously a Borel measurable function 
on [0, 1] and hence by Lusin’s theorem ([2]) p.243) we can find for each « > 0 
a compact K, C [0, 1) such that (i) u(K.) > 1 — € and (ii) — restricted to K, is 
continuous. If M, = ¢{K,j, then M, C X and is a compact subset of the real 
line. Since \ is induced by &, A(M,) > 1 — ¢. But M, is a Borel set of the real line 
and this shows that u(M,) > 1 — ¢, contradicting the assumption that ys(X) = 0. 

Thus A cannot be induced from Q. This completes the discussion of the example. 
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SS oro 
CORRECTION TO “PROBABILITIES OF HYPOTHESES AND 


INFORMATION-STATISTICS IN SAMPLING FROM 
EXPONENTIAL-CLASS POPULATIONS” 


By Morton KuprerMan 
The George Washington Uttiversity 


In the paper cited in the title (Ann. Math. Stat., Vol. 29 (1958), pp. 571-575): 
p. 572, line 5. For >-xp(x, 6,) read aa p(x, 9,,). 





CORRECTION TO “POWER FUNCTIONS OF THE GAMMA 
DISTRIBUTION” 


G. D. Brernpt 


Professor I. R. Savage has called to my attention, through the Editor, the 
fact that I have overlooked reference to previous work appearing in Eisenhart, 
Haystay, and Wallis, Techniques of Statistical Analysis, and bearing on results 
reported by me in the Annals, Vol. 29, No. 1, March 1958, pages 302-306. 

On pages 274-275 of Eisenhart, Haystay, and Wallis, in Figures 8.1 and 8.2, 
there ure given operating characteristic curves for the chi-squared distribution 
for eight selected degrees of freedom when the significance level is 0.01 and 0.05. 
Inasmuch as the chi-squared distribution is a gamma distribution with } (degrees 
of freedom) = the parameter gamma in my paper and with 2 = the parameter 
beta in my paper, and since their rho is equivalent to my delta, there is a simi- 
larity in the reported results. This similarity has resulted in some overlap in the 
results of the two papers in that ten of my forty-eight power curves have an 
equivalent in the operating characteristic curves in the previous work. 

I should like to acknowledge this previous work, and also that of Ferris, 
Grubbs, and Weaver, by having the following two references added to the two 
Which already appear at the end of my paper: 


[3] Selected Techniques of Statistical Analysis, Churchill Eisenhart, Millard W. Haystay, 
and W. Allen Wallis, editors, McGraw-Hill, New York, 1947, pp. 270-278. 

1] Cuarves D. Ferris, Frank EF. Grupps, AND CHALMERS L. WEAVER, ‘“‘Operating char 
acteristics for some common statistical tests of significance,’’ Annals of Mathe- 
matical Statistics, Vol. 17 (1946), pp. 178-197. 
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ABSTRACTS OF PAPERS 


Abstracts of papers presented for the Cambridge, Massachusetts Meeting 
of the Institute, August 25-28, 1958 


is. On the Relationship Algebra and the Association Algebra of the Partially 
Balanced Incomplete Block Design. Junsirno OGawa, University of 
North Carolina. 


A. T. James (1957) defined the so-called “relationship algebra of a design’’ and showed 
that the partition of the total sum of squares into partial sums of squares can be charac- 
terized by the structure of the relationship algebra. He constructed the relationship alge- 
bras and analyzed their algebraic structures for the randomized block, the latin-square 
and the balanced incomplete block design. The purpose of this note is to construct the 
relationship algebra for the partially balanced incomplete block design and analize its 
algebraic structure. The main result of this paper is that the second degree irreducible 
representations of the relationship algebra are completely determined by the irreducible 
(linear) representations of the association algebra defined by R. C. Bose. (Received June 2, 
1958.) 


19. Estimation of the Medians for Dependent Variables. Ouive Jean Dunn, 
University of California. 


The problem considered in this paper is that of using non-parametric methods to esti 
mate the unknown medians of two dependent variables. In various types of research, it 
is convenient to consider a sample of n individuals and to take measurements at two dif 
ferent times or at two different levels of treatment. These 2n measurements are then a 
sample of size n from a bivariate distribution. For independent variables, two confidence 
intervals of the classic type using order statistics may be used as simultaneous confidence 
intervals for the two medians by simply multiplying the two probabilities to obtain the 
new confidence level. In this paper it is shown that for two dependent variables these 
same confidence intervals may be used as a set with bounded confidence level. Compari- 
sons are made on the basis of average length between these intervals and other joint inter 
vals for the means of a bivariate normal distribution. It is also shown that the result of 
this paper does not generalize, at any rate in the most obvious way, to three or more de 
pendent variables. (Received June 16, 1958.) 


50. On the Problem of Incomplete Data. (Preliminary report) JuNsiro OGAwa 
and BerNarp 8. Pasternack, University of North Carolina. 


Consider a sample of size n, 2, 22, *** , 2a, drawn from N(u, o*) (¢? known). Suppose 
only n — k observations are available, 21, 22, +++ , 2n-x (say). Let 
1" ~n—k ; - o 
F = 1/n Diz; , #* = 1/(n — k) Deva, u = (Vn #)/o, u* = (Vn — k #*)/o 
and define u\a) by Pyo(\u 3 u(a)) = 1 — a and u*(a) by Py, (\u*| s u*(a)) = 1 — a, 


where Ho:u = wo. We define Py{|u| S u(a) | |u*| > u*(a)}, a being prefixed, as the re- 
versal function of this test procedure. The reversal function has been tabulated for various 
values of k/n. When o? is unknown, the test procedure depends upon t, or, for incomplete 
data, ¢*. A least upper bound for ¢ given ¢* has been obtained, i.e., the minimum value 
of r such that P,,(\t}) > 7 | ¢*) = 0. Similar bounds (both ].u.b. and g.l.b.), in probability, 
have also been obtained for situations involving one-way classifications, the general linear 
model and Hotelling’s 7°. Another approach to the problem of missing data involves the 
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introduction of a chance mechanism according to which observations are missed. Research 
along these lines is now in progress and the authors hope to present some of these results 
in the near future. (Received June 20, 1958.) 


51. Aids for Fitting the Pea son Type III Curve by Maximum Likelihood. 
(Preliminary report) J. ArtHur Greenwoop, Iowa State College and 
Davin Duranp, M.L.T. 

New tables and formulas of approximation are given for the function p = yoé(y), where 
¢(y) is the inverse function to y = Inp — d/dp InI'(p). With the aid of these tables, one 
may obtain by direct interpolation the maximum likelihood estimate (joint) of the ex 
ponent in a Type III distribution with known lower limit. Application of the tables to 
the Type III with unknown lower limit and to the Type V are briefly discussed. (Received 
June 20, 1958.) 


52. Admissible Estimates and Maximum Likelihood Estimates (Preliminary 
report) ALLAN Brrnspaum, Columbia University. 


A definition of admissibility of a point-estimate of a real-valued parameter @ is formu- 
lated on the basis of a slightly generalized form of the Neyman-Pearson theory of conti 
dence regions, using Ann. Math. Stat., Vol. 27 (1956), pp. 544-545,- without introduction 
of loss functions. Necessary and sufficient conditions for existence of such estimates are 
given under mild regularity conditions. By extending methods developed in Ann. Math. 
Stat., Vol. 26 (1955), pp. 21-36, it is shown that each admissible estimate is obtainable 
as the (unique) solution of an equation 0/06 log L(z, 0) = G(@), where G(@) is a known 
function and L(z, @) is the likelihood function. Setting G(@) = 0 gives the maximum likeli 
hood estimate 6, which is thus shown to be admissible. In the case of non-existence of 
admissible estimates, asymptotically admissible estimates are defined and shown under 
certain conditions to exist and to include 6. An estimate @ is called median-unbiased if 
Prob {6 < 6| 6} = } for all @. 6 is shown under general conditions to be asymptotically 
median-unbiased, and to be a convenient approximation (often close for moderate sample 
sizes) to the median-unbiased admissible estimate (which is often difficult to compute). 
Relations to sufficiency and to multi-parameter estimation problems are discussed. (Re 
ceived June 24, 1958.) 


53. Stochastic Models for the Electron Multiplier Tube (Preliminary report) 
Epwarp K. Daron, Winttarp D. JAMES ann Howarp G. Tucker, 
University of California. 


Four stochastic models are proposed for the electron multiplier tube, two being branch- 
ing processes involving Poisson distributions and two being branching processes involving 
binomial distributions. In each model there are two unknown parameters. It is desired 
to determine the best model among these and to estimate the parameters for it. Although 
the probability generating functions in each case are easy to derive, explicit formulas 
for the probability distributions in each case could not be found. A method for testing 
these models is presented which is based on the following theorem. THEorem. Let X be 
a random variable which takes on non-negative integer values, and let X; , +--+: , Xn, °*+ de 
note an infinite sequence of independent observations on X. Let g(u| a) = E(u*) be the proba 
bility generating function of X which depends on a (vector) parameter a and is continuous 
in a. Let ao be the true value of a, and assume that there exists a sequence {&,} of random 
variables which converges to ao with probability one. Then for any value of u for 
which u® — u 4 Oand g(u?| a) < & there exists a subsequence {ay,,} Of {@n} such that the 
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, Rie , 
limiting distribution of the ratio of Diu *|1 Sk Sn} — ng(u| dy,,) to either the square 
root of n\g(u? | Ay,) — g*(u| ay,)| or to the square root of 


2x 
n(n fu 


2 x aa 
Flisksn} — (n"ZD{u*|1s k S n})?*) 
is normal with mean zero and variance one. A resumé of numerical results is included for 


three different sets of data corresponding to three different energy inputs. (Re- 
ceived June 26, 1958.) 


54. On the Choice of Sample Size in the Kolmogorov-Smirnov Tests. JuDAH 
ROSENBLATT, Purdue University. 


If F,, is the empirical distribution based on independent random variables X, , --- ,Xa, 
with common c.d.f. F, it is well known that a test of the hypothesis Hy: F = F, having 
asymptotic probability of type one error not exceeding a@ is to reject Hy if and only if 
n'2 di(F, , Fo) = n'? sup, | F.(2 I'o(z) | > hia , Where 


lim, ..f’¢jn'/? di(F., F) > hie} =a 
if / is continuous. Massey has suggested that the sample size n needed to achieve 


Pp{Reject Ho} 2 1 — 8B when d,(Fy , F) 2 / 


« chosen as follows: n is the smallest integer such that 2[n'/?/ hial 2 ¢s, where 


fea Qr)/2e-P2 dt = 1 — B. 


This suggestion is motivated by the normal approximation to the binomial distribution. 
\ thorough investigation is made of this suggested procedure, and a completely justified, 
still rather simple technique is devised for choosing n such that 


Pp{Reject Hy} 2 1 — 8 when d,(Fo, F) 2 I. 


The investigation is in two parts. First a region (near p = 4) is determined where 


[n( p+ —n!/Phial Lp 
ae (") ora — p)” 
vy=0 


takes on its minimum value. This, together with the Uspensky version of the normal ap- 
proximation to the binomial (with correction and error term) leads to the justified pro- 
cedure for choosing n with the desired properties. This n is not much larger than that 
suggested by Massey and is far smaller than the one derivable from Chebychev’s inequality. 
(Received July 2, 1958.) 


55. The Use of Sample Quasi-Ranges in Estimating Population Standard 
Deviation. H. Leon Harter, Wright Air Development Center. 


The use of sample quasi-ranges in estimating the standard deviation of normal, rec- 
tangular, and exponential populations is discussed. For the normal population, the ex- 
pected value, variance, and standard deviation of the rth quasi-range for samples of size 
n are tabulated for r = 0(1)8 and n = (2r + 2)(1)100. The efficiency of the unbiased esti- 
mate of population standard deviation based on one sample quasi-range is tabulated for 
the same values of r and n. Estimates based on a linear combination of two quasi-ranges 
are considered, and a method is given for determining the weighting factor which maxi- 
mizes the efficiency. The most efficient unbiased estimates based on one quasi-range for 
n = 2(1)100 and on linear combinations of two adjacent quasi-ranges and of any two quasi- 
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ranges (r <r’ S 8) forn = 4(1)100 are tabulated, along with their efficiencies. An example 
illustrates the use of these estimates. For rectangular and exponential populations, the 
most efficient unbiased estimates based on one quasi-range are tabulated, together with 
their efficiencies, also the bias when estimates which assume normality are used, | 
July 2, 1958.) 


teceived 


56. On a Limiting Distribution Due to Renyi. D. (i. Cuapman, University of 
Washington. 


Let XN be a real valued random variable with distribution function (d.f.) F(x). Let F(a 


denote the emperical d.f. based on n independent observations x; , 22, +--+ , 2, of NY. Renyi 
(“On the theory of order statisties,’? Acta Math., Acad. Sci. Hungary, Vol.4 (1953), pp. 191 
231) has given the limiting distribution of n'?R,(a) = n'? suppuy>e [Fn(x F(x)\/F(a 


asn tends to infinity, a being an arbitrary positive constant. It is therefore of interest to 
determine the limiting distribution of R,,(0), i.e., without the arbitrary restriction F(z) = 
a. The result is obtained that P,{[R,(0) S «| = e/1 + «forall n, so that the limiting dis 
tribution of R,(0) has the same form. Also studied in this paper are the limiting distribu- 
tions of some slight generalizations of R,.(a). The method used is that due to Doob which is 
simpler than Renyi’s and may also be used to determine the asymptotic power of the Smir- 
nov test of goodness-of-fit for certain alternatives. (Received July 2, 1958 


57. Power and Control of Size of Some Optimal Welch-type Statistics. Roar: 
S. McCuLLouGu anp JOHN GURLAND. 


A Welch-type statistic (Welch, Biometrika, 1938) is considered for testing equality of 
means in two normal populations with unknown variances which may be unequal. For 
various combinations of small sample sizes a nearly perfect one-sided control of size is 
possible, that is, optimal statistics are available which keep the size extremely close to a 
preassigned level if one population has a larger variance than the other. For two-sided 
control of size, that is with no restriction on the direction of inequality of variances, opti 
mal statistics are available which keep the size below a pre-assigned level but arbitrarily 
close to the level over an infinite range of variance values. A table giving the optimal 
statistics for various combinations of small sample sizes has been prepared with the aid of 
an electronic computer. Tables of the power are also included. (Received July 2, 1958 


58. A Note on Estimating Translation and Scalar Parameters. Josern A. 
Dusay, University of Oregon. 


Let X = (Xi, --- , X,) bea random variable whose distribution depends on an unknown 
real valued parameter 6. Let 6(X) be an estimator of 6, [ be the class of all maximal trans 
lation invariant functions of VY and assume the loss in estimating @ by 6(N) is k(6(N)—@) 
A necessary and sufficient condition that among all estimators of the form 6(N) + v5 (X 


y €T, vu constant, 6(Y) uniquely minimize the risk is given and an explicit construction of 
the minimum risk estimator is derived therefrom. In the particular ease where 6(N) has 
the translation property, the class of estimators of the form 6(Y) + «y(X) is the class of 
all estimators having the translation property. Thus, a construction of the minimum risk 
estimator having the translation property is exhibited of which the constructions given by 
Pitman (1939) and Blackwell and Girshick (1954) in the case where @ is a translation pa 
rameter are special cases. An example is given in which @ is not a translation parameter in 
the usual sense but estimators having the translation property are naturally admitted. 
Under an appropriate transformation the results are applicable to the estimation of scalar 
parameters. (Received July 2, 1958.) 


¥ 
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o%. The Moments of the Maximum of Partial Sums of Independent Random 
Variables. Joun S. Wuire, Minneapolis-Honeywell Regulator Co. 


Let X,, X, be independent identically distributed random = variables. Let 
S, «= 35.1 X; , Si max (0, S;), S, max,<, (Si), mk) = E((ST)*) and M,(n) = 


E(S/.). By successive differentiation of Spitzer’s Theorem (Trans. Amer. Math. Soc., Vol 
82, 1956) the following recursion relation for the moments of S, is obtained 


kiS) ) V " >: 1 wl 0 (:) jm (h kKiM in hk) 


i 


feceived July 2, 1958 


(0. A Characterization of Triangular Association Scheme. S. S. SurikHANpE, 
University of North Carolina. (By title) 


Ifa partially balanced design with two associate classes for» = n(n 1)/2is triangular, 
Bose and Shimamoto, J. Amer. Stat. Assn., Vol. 47 (1952) pp. 151-190) then its parameters 
are given by» = n(n 1)/2,n 2n 1, pi = n — 2, pr = 4. Connor (Ann. Math. Stat., 


Vol. 20 (1958), pp. 262-266) has proved that if n 2 9, a design with above parameters is 
necessarily triangular, The following Lemma is established and it is utilized to prove that 
Connor’s result is true for» = 5, 6 as well 

Lema: If for a design with above parameters, the l-associates of any treatment or can 
be divided into two sets (4. ye , > Bee), (i , Bas 2 such that (yi. yj) = (2; ,2 
= (y;, 2;) = land (y;, =2. tee 1, 2, ce 2, then the design is triangular 
Received July 2, 1958 


(1. A Problem in Two-Stage Experimentation. (Preliminary Report) Donaup 
L. Ricurer, University of North Carolina. 


Let NV, and N. be two normal populations with unknown variances and an unknown but 
common mean uw; it is desired to estimate w using a fixed number n of observations. For this 
problem, a two-stage sampling procedure is proposed in which m observations are taken 
from each of N; and \. in the first stage and, depending on the observed values, n — 2” 
observations are taken from one or the other population in the second stage. Associated 
with an estimator of w, a risk funetion is defined which is equal to the variance of the esti 
mator multiplied by a suitable stabilizing factor. For a particular unbiased estimator, it is 
shown that the minimax value of m is asymptotically equal to cn?®. Extensions in several 
directions are being studied teceived July 2, 1958.) 


(2. Tests for the Validity of an Exponential Distribution of Life. Bexsamin 
ISpstein, Stanford University. (By title) 


In this paper a number of procedures are given for testing, on the basis of life test data, 
whether there are substantial deparatures from an exponential distribution of life. The 
particular procedures that one should adopt depends on the class of alternatives one is 
testing against. A number of the tests are based in an essential way on fundamental prop 
erties of Poisson processes. (Received July 2, 1958.) 


63. Stochastic Models for Length of Life. Bensamin EpsTein, Stanford Uni- 
versity. (By title) 


Various models for length of life are considered in this paper. Among these are (1) models 


which we call exponential (these involve Poisson processes and appropriate generalizations 
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of such processes); (2) models based on the conditional probability of failure function; 
(3) extreme value models. Implications of and interrelations among the various models 
are discussed. Many examples are given. As examples of models (1) and (2) one may cite 
the recent paper by Z. W. Birnbaum andS. C. Saunders (J. Amer. Stat. Assn., Vol. 53 (1958), 
pp. 151-160) in which they give a statistical model for the life length of structures under 
dynamie loading (i.e., fatigue) and a recent report by George H. Weiss in which it is shown 
that some kinds of mechanical failure, such as creep failure of oriented polymenie fila- 
ments under tensile stresses, can be viewed as ‘“‘pure death’’ processes. An example of where 
model (3) may be relevant is in phenomena involving corrosion. (Received July 2, 1958.) 


44. Truncation and Tests of Hypotheses. Om P. AGGarwat anv Irwin 
GuTrMaNn, Purdue University and Princeton University. 


Consider a normal distribution with variance o? and a sample from the distribution 
obtained from this normal distribution by truncating it at the same distance a on both sides 
of the mean. The distribution of the sample mean for sample sizes up to 4 is obtained ex 
plicitly and the results of applying the usual tests of hypotheses for one-sided testing of 
the mean of a normal distribution are examined when a and o? are known. Some tables are 
given and it is found that the loss in power decreases very rapidly with the distance of the 
alternative value of the mean from the one tested and also with the distance of the trunca 
tion from the mean. (Received July 2, 1958.) 


65. Mathematical Outline of Polyvariable Analysis (Including Random Balance). 
F. FE. SatrrertTHuwaire, Statistical Engineering Institute. 


A polyvariable technique for statistical analysis is defined as any estimation procedure 
supplied to the linear model, ¥Y = BZ + E = BZ + El = AX, A = (B, FE), X = (Z, 1), 
which gives estimates for all (or of some) of the A unknowns with associated confidence 
limits that are valid and finite without restrictions on the number of A unknowns in the 
model. Specifically the number of unknowns may exceed, and often will greatly exceed, 
the number of data sets. The theoretical minimum number of data sets is 2. The necessary 
minimum for a specifie application to give useful precision depends primarily on the signal 
noise ratio for the available data. In many types of applications satisfactory precisions 
are obtained with 5 to 30 datasets for models containing a large number of unknowns. This 
paper is a mathematical outline of method and justification (including, in most cases, 
formal proofs) for the more important classes of polyvariable methods: (1) Polygression, 
(2) Bigression, (3) Quadratic, (4) Homovariance, (5) Hetervariance, (6) Random Balance, 
(7) Split Data. (Received July 3, 1958.) 


(ij. Statistical Theory of Some Quantal Response Models. ALLAN Birnxeaum, 
Columbia University. (By title) 

Let V = (S,,--- S;), where S,’s are independent Bernoulli observations: Prob {S, = 1} = 
Po(y), a known strictly-inereasing function of the unknown real-valued parameter y, 
Prob {S, = 0} = Q,(y) = 1 — P,(y), for g = 1, --- k. If P,(y) depends on known parame- 
, Whose values the experimenter may determine, these are called ‘‘design 
parameters.’’ Fisher’s (Phil. Trans. Roy. Soc. London, A, Vol. 222(1922), pp. 363-366) 
method in treating estimation and design problems in the dilution series model (P?,(y) = 
1 — exp (—a,y), g = 1, 


ters 4, , by _ =< 


k) is formulated more explicitly, particularly the use of the 
practical equivalence of designs having similar information curves J(y) = DJ,(y), where 


T,(y) = (0/dyP,(y))?7/P,()Q,(y). The “information area’”’ fra) dy is introduced and 
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used in various design problems. Point- and interval-estimation, hypothesis-testing, and 
other inference problems, and related problems of design and comparison of experiments, 
are treated, using efficient or simpler less efficient statistics, with examples from mental 
tests, industrial gauging, genetics and special analytical bioassays. It is shown that a 
necessary and sufficient condition for existence of a sufficient statistics is that, in terms 
of z = z(y) = log P:(y)/Q:(y), the model have the logistic form: 


P,(y) = (1 + exp (—a,z — b,)"' for g = 1, k; 
then Da,S, is sufficient. (Invited address given at Los Angeles meeting, December, 1957. 


Received July 7, 1958 


67. Statistical Theory of Tests of a Mental Ability. Antawn Birnpaum, Columbia 
University. (Invited paper) 


Several writers (F. Lord, Psychometrika, Vol. 18(1953), pp. 57-76, and references therein 


have studied the following model of a mental-ability test consisting of k items: Let S, = 1 
or 0 us a subject’s response to item g is correct or not, g = 1, --- &. Then if a subject has 
ability y, the probability that his response pattern will be Vo = (S, , «++ S,) is 


Te aPo(y) Q.(y)' 


where P,(y) = @(a,y bo), g = 1, +++ k, and (ua) is the standard normal ec.d.f. Assuming 
item-parameters a, , 6, known, problems of inference and design (choice of k, a,’s, 6,’s) 
have been treated, as have Bayesean problems with y distributed according to @(y). Re 
placing #(u) by the logistic c.d.f.¥(u) = (1 + exp (—w))-' gives a more tractable, perhaps 
equally valid, “logistic test model”: t = Da,S, is a sufficient statistic, typically nearly 
normal for each y; hence a design (a; , b) ; +++ ; ae, by) is practically characterized by its 
‘information curve” I(y) = d/dyE(t|y) = var (t| y). UI(y) = ce¥(ay — 6) for some a, b, « 
(as in cases of principal interest), properties of Bayes estimates E(y| 7?) are given as func 
tionals of the c.d.f. of a weighted sum of two independent (Fisher’s) z variables; numerical 
illustrations are given. A simple efficient method of estimating a,’s, b,’s is given. (Received 
July 7, 1958.) 


68. On Logistic Order Statistics. ALLAN Binnpaum, Columbia University. 
(By title) 


Plackett (Ann. Math. Stat., Vol. 29011958), pp. 131-142) has demonstrated the usefulness 
and tractability of logistic order statistics in treating problems involving order statistics 
from various distributions. The present more descriptive investigation of logistic order 
statistics, a by-product of development of statistical theory of a “logistic model”’ of ability 
tests, is a contribution to the comparative study of order statistics initiated by Hastings 
et al. (Ann. Math. Stat., Vol. 18(1947), pp. 413-426). Because with suitable choice of scale 
parameter the logistic c.d.f. approximates the standard normal c.d.f. with error < .01, 
the logistic model is of interest, and may be sometimes preferred, when equally plausible, 
to the more usual (but less tractable, as regards order statistics) normal mode! of the 
population sampled. The presentation illustrates the effect on order statistics of such a 
change of parametric assumptions. Tables and graphs compare means and variances of 
logistic and normal order statistics for various sample sizes. The tractability of asymptotic 
variance and covariance formulae, and of some distributions related to extreme values, 
is illustrated. The distribution of each logistic order statistic coincides (to within a seale- 
factor) with a certain Fisher’s—z distribution, for which extensive tables and approxi 
mation methods are available. (Received July 7, 1958.) 
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69. Industrial Experience with Fractional Replicates. Curnperr Danien, 
(Invited paper) 


Typical conditions of industrial experimentation (including numbers of factors. si- 
multaneously studied, number and magnitude of effects sought, and restrictions on time 
and costs) are reviewed. For meeting these, the sequential use of nested sub-fractions of 
fractional replicate designs in the 2?~¢ series is described. Since generally choice of a most 
informative initial sub-fraction is incompatible with choice of a most informative com 
plete fractional replicate, the relative merits of each type, and of intermediate types, are 
discussed. A number of sequential designs are given. Methods are reeommended and il 
lustrated for inspection and eriticism of data from 2” ¢ experiments by using the graph 
on appropriate probability paper) of the empirical ¢.d.f. of absolute values of contrasts, 
to detect one or two mavericks, inadvertent plot-splitting, antilognormal data, and the 
presence of several real effects. The distribution of this e.d.f. is studied under several hypoth 
eses, and the use is described of the operating characteristics of a related statistic given 
by A. Birnbaum. Partial duplication is recommended when an unbiased estimate of error 


variance is required at an early stage. (Received July 7, 1958. 
70. On the Analysis of Factorial Experiments without Replication. ALLAN 
Birnpaum, Columbia University. (By title) 


Inferences from factorial experiments without replication are usually based on a formal 
assumption that certain interactions are zero. In an altogether exploratory research situ 
ation, any statistical model giving a formal basis for informative inferences will typically 
be too schematic and restrictive of unknown conditions to be claimed ‘‘valid,’’ or a basis 
for inferences which are ‘‘valid’’ except in the hypothetical formal sense; such a model is, 
perhaps along with other models, a basis for “plausible inferences,” i.e., inferences drawn 
in a formally-valid manner, based on a model which is more or less plausible. Under some 
conditions (which are reviewed), the following schematic model is usefully plausible: The 
m contrasts a; are independent, normal, homoscedastic; at most (any) r of their means 
are non-zero. For r = 1, to decide which, if any, mean is non-zero, the statistic max, a;/D,a; 
is optimal. An alternative graphical procedure developed by C. Daniel, which has important 
advantages, is related to the ratio of max; |a;| to another ordered |a;|. Critical values, power 
and related properties, Comparisons with more conventional statistics, and discussion of 
cases r > 1, are given. (Received July 7, 1958 


71. Linear Regression in the Multivariate Normal Case. Cuar.es Srein, Uni- 
versity of California, Berkeley. 


The problem of estimating the regression vector of one random variable on p others 
when all have a joint normal distribution is considered. There are n > p + 2 observations 
on the whole vector, the mean is assumed 0 for simplicity, and the loss is taken to be the 
mean squared error of prediction when the estimated regression vector is used to make a 
prediction on the basis of a new random observation on the predictors, divided by the re 
sidual variance. The usual (maximum likelihood) estimate of the regression vector is 
minimax. It is admissible for p = 1,2 > 4 and for p = 2 and n sufficiently large. For p > 3 
it is intuitively clear (by analogy with the problem of estimating the mean of a multi 
variate normal distribution) that the usual estimate is not admissible. One possible method 
of improvement is to multiply the usual estimate by a constant depending on the popu 
lation multiple correlation coefficient, which can be estimated from the sample coefficient 
This will be more helpful if a guessed regression or a regression on a small selected set of 
predictors is first subtracted out. Other possible improvements are suggested when the 
covariance matrix of the predictors is known. It should also be possible to make further 
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improvements when the covariance matrix of the predictors is not known but guessed or 
estimated on the basis of an additional sample. (Received September 4, 1958). 


72. Some Population Estimation Models and Related Limit Distributions. 
{0NALD Pykr AND N. DONALD YLVISAKER, Stanford University. 


The following two stage tag-and-sample model is studied. During stage I, J + 1 samples 


of sizes mo, m , ° my are taken from a population of size S;. In each sample, all un 
tagged members are tagged and the sample replaced. During a later stage II, K + 1 samples 
of sizes (np , mi, *+°* , mg) are taken in each of which all tagged members are tagged with a 


different tag than that used in stage I. The time interval between stages is assumed to be 
large relative to the time required to perform the tagging and sampling. Constant de- 
terministic birth and death rates w, and p are assumed during the intermediary time period 
Maximum Likelihood estimates of S; , u and p are obtained under both Poisson and Bi- 
nomial assumptions on the distribution of the recovery random variables (r.v.). Some gen 
eral limit theorems are derived and applied to show that under a suitable reparametrization 
(corresponding to large sample and population sizes) the recovery r.v.’s and the Maximum 
Likelihood estimates are asymptotically normally distributed. A further generalization in 
which the sample sizes are assumed to be r.v.’s is considered. These results are then ap- 
plied to data obtained from actual field experiments. (Received July 7, 1958 


73. Applications of a certain Representation of the Wishart Matrix. Ronrar 
A. Wissman, University of Illinois. 


Let X be a p X n matrix (p S n) whose columns are independent and distributed like 
N(O, S). It is known (e.g., J. G. Mauldon, J. Roy. Stat. Soc., Ser. B, Vol. 17 (1955) pp 
79-85) that the Wishart matrix VY’ can be written as C77’C’, where CC’ = &, T is lower 
triangular with independent elements 7;; , Ty: is xn-i41 (@ = 1, +++: , p) and all 7;; with 
i> jare N(O,1). This allows representation of any function of the Wishart matrix in terms 
of independent normal and yx variables. If the population correlation between two variates 
is p, the sample correlation r can be represented by r/(1 — r?2)"% = (Ta + Tir p/(1 — p?)"?) 
Ts. (this representation was also obtained by G. Elfving, Skand. Aktuarietids., Vol. 30 
(1947), pp. 56-74). This can be described as a non-central ¢,—; variable, with random non- 
centrality parameter 7; p/(1 — p*)"*. If the population multiple correlation between one 
variate and the remaining p — 1 is R, the sample multiple correlation R can be repre- 
sented by R2/(1 — R2) = (7. + Tn R/O — R24 + +e Toi )/ Pos . This is a non-central 
F y-i.n-pai Variable, with random non-centrality parameter Ti, R2/(1 — R*). The sphericity 
criterion Z (T. W. Anderson, An Introduction to Multivariate Statistical Analysis, Wiley, 
New York, 1958, section 10.7) in a bivariate population, when the hypothesis is true, can be 
represented by Z/(1 Z) = 2TnT2/((Trn — Te)? + Ti), which is an F2,-2,2 variable. 
(Received July 7, 1958. 


74. Order Statistics and Estimation. M. M. Kao, University of Minnesota. 
(Introduced by Milton Sobel) (by title) 


Let (1) f(x) = e 7, if r > 0, and zero otherwise, and X; be the ith order statistic from 
a sample of N independent observations from the population defined by (1). The following 
results are proved. (1): Let 1 S rm < re < +++ < rp S N be a set of fixed integers and 
X,,,Nry, +++ » Xr, be a p-set (subset) of the order statistics X; < X2 < +--+ < Xy.Then 
the order statistics define a Stochastic Process with r; , re , --- , as the parameter set, which 
has independent (but not stationary) increments. The finite-dimensional distributions of 


the process in terms of its log characteristic function are given by Yr, .ro.....7,(£1 . fe. +++ Ep) 
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a eile x lo wr all . 
= 10g ¢r;.10.....7y(é1 » 2» °°* Sp) = log N1/(N — rp)! — DPo Doi +} log (N — inj4i m) 
where 9; =D}U; &, ro = 0, and¢is the ch.f. (11): Let X,; and r; be defined as in (1). Then 


{X,,,1 Sr; S N{ forms a Markov Process in the strict sense as well as in the wide sense 
(in either case, the Process is non-stationary). (111): Some problems of interest (in physi 


ological data) are the following: (i) The r; are random but N is fixed. Suppose Prob 
ry = ty Kk =e 1, 2,-** pl ts < feign, 8 a 1, 2, <> pp — TY & Digits... chy » AOE Ps, > 
Di; ip Pi > = 1, where (t, < ts < -++ < tp)'= 1,2, my i= 0, and the p’s depend 
on a set of constants, (A, , Ae, «+: , Ag). Then the Y,. defined similar to those in (1) still 


form a Stochastic Process whose finite-dimensional d.f.’s are determined by the ch.f 
» . | '‘p rl p ' 1 

o(é: , °°* 5 Ep 5 An, °°° Ap) _ Ziis ip Piy-iz Pp (D2. CN m +1) I j= TT, 1 (\ 

— in; + m)]. (ii) The ease when N is a random variable. Specifying the d.f.’s in some cases 

of interest for the r; , the limit d.f.’s of some linear combinations of X; are considered 

The estimation of the constants (A, @) and the distribution of the (A, @) are treated using 


the above results when in (1) 2 1s replaced by (2 1)/0, 48 > 0. (Received July 7, 1958 


75. A Note on Order Statistics and Stochastic Independence. GeraLpS. Rocrrs, 
University of Arizona. 


The following theorem is proved. Let z be a continuous or discrete type real random 


variable. Let 2, S --- S x, be the order statistics based on a random sample of size n from 
this x distribution. Let z = z(2, , +++ , 2;) be a statistic based on the first 7 < n items only 
If z is stochastically independent of z;, then z is stochastically independent of all 


re ,j <k S n;if zis stochastically independent of some «x; ,j < k Sn, then z is stochasti 
cally independent of x; and hence of all 2, ,7 © & S n. The first result is direct, since in 
terms of the conditional probability density functions, g(z|2;) = g(z|aj,-+++ , tn) 

For the second part, in g(t, , +++ , Ze-1 | 2) , let 2, be considered as a ‘‘parameter.’’? Then 
(a,-1 | 2) is a “complete single sufficient statistic” for x, ; also, the distribution of (2 | 2;) 
is free of the ‘‘parameter zx; .”’ By a well known theorem, (Basu, Sankhya, Vol. 15 (1955) 
pp. 377-380), (2 | 2.) and (2,_1 | 2%) are stochastically independent. It follows that z and 
z,.-1 are stochastically independent; similarly, with an induction, z and x, ,j S k Sn 
are stochastically independent. (Received July 7, 1958.) 


76. A model for Failure Data and its Applications. (I’reliminary report) Anpii 
G. Laurent, Wayne State University. 


When a “ageing process” takes place, the response pattern of a ‘‘system’’ to a stimulus \ 
does not follow an exponential distribution. The model S(¢) = exp [1 + ¢ — exp (t)], where 
S(t) is the ‘“‘survival function,” i.e., the integral of the ‘‘XN-to-failure’’ distribution and 
t = (X — Xo)/r, has been proposed to meet this situation and tables provided for its use 
(Oper. Res., February, 1957, p. 150; Oper. Res. 13th National Meeting, p. 35.) The present 
paper describes the more important features of the model above and gives the formulas for 
the expected values and the covariance matrix of the order statistics of a sample of size n 
Tables of the expected values and the variances for n = 1 to 15, of the covariances for 
n = 2 to5 are provided. The minimum variance linear unbiased estimates of the parameters 
of the distribution based on order statistics are studied for small samples and compared 
to other estimates from the viewpoint of efficiency. Related models are considered. (Re 
ceived July 7, 1958.) 


77. A Convolutive Class of Monotone Likelihood Ratio Families. 8. G. Guuryr 
AND Davip L. Wa tuace, University of Chicago. 


A one-dimensional family f(z, 6) of densities on the real line or of probabilities on the 
integers, with the real parameter @, is called a monotone likelihood ratio family if the ratio 
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f(c, 0')/f(a, 0) is nondecreasing in «for 6 S 6’. If several monotone likelihood ratio families 
each have all probability on two points which are the same for all families and all parameter 
values, then their convolution is a monotone likelihood ratio family. The extent to which 
similar results hold for distributions on three and more points and, with appropriate ex- 
tensions of definitions, for multidimensional distributions on the vertices of the simplex 
and the cube is determined. A sufficient condition that the convolution of monotone likeli- 
hood ratio families be a monotone likelihood ratio family is that for each family, the ratio 
fir + h, 0)/f(x, 6) be non-inereasing in sz for all h > 0. (Received July 7, 1958.) 


7%. On the Exact Joint Distribution of the First Two Serial Correlation Coeffi- 
cients. V. K. Murruy, University of North Carolina. 


Any test of the hypothesis that up to a particular lag the true serial correlation coeffi- 
cients are zero against some suitable alternative seems to necessitate knowledge of the 
joint distribution of serial correlation coefficients. As far as the author is aware even in the 
case of the first two serial correlation coefficients, the joint distribution has not so far been 
obtained in a simple closed form. In this note the joint distribution of 7; and rz has been 
obtained for samples of independent normal variates assuming the sample size to be of 
the form 4n + 1 where n is a positive integer and adopting the circular definition suggested 
by Hotelling. This result has been obtained using a result of R. L. Anderson on the char- 
acteristic roots of the serial covariance, and inversion formulae for the distribution of 
ratios of quadratic forms given by Gurland. Some properties of the joint distribution are 
obtained. The case of more than two serial correlation coefficients will be dealt with in a 
subsequent paper. (Research under ONR contract Nonr 855(06)). (Received July 7, 1958; 
revised July 28, 1958. 


79. Confidence Bounds Associated with a Test for Symmetry. 2}. GNANADESIKAN, 
The Procter and Gamble Company. 


Ina p variate nonsingular normal distribution N{y, ZI, one may be interested in testing 
1 hypothesis of symmetry in the means, viz., that the p variates have the same mean. The 
tests obtained by using either the extended Type I union-intersection principle or the likeli- 
hood ratio are identical and it is well known that they are equivalent to an F-test with 
appropriate degrees of freedom. However, from the standpoint of confidence procedures, it 
is shown that the usual elliptical region can be replaced by simultaneous interval state- 
ments on parametric functions which are measures of departure from the null hypothesis. 
Also using a ‘‘truneation’’ procedure it is shown that one can study contrasts which are of 
particular interest and are components of the null hypothesis. The interval statements, 
which have a joint confidence coefficient 2 (1 — a), are easier to use than the elliptical 
regions which have an exact confidence coefficient (1 — a). Received July 7, 1958.) 


80. On Stochastic Approximation. C. DerMAN anv J. Sacks, Columbia Uni- 
versity. 


\ very general theorem was proved by Dvoretzky (‘On stochastic approximation’’, 
Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability 
on the convergence, with probability one and in mean square, of stochastic approximation 
procedures. Wolfowitz (“On stochastic approximation methods,’”? Ann. Math. Stat., Vol. 
27, 1956) presented a different proof. In this paper a third and simpler proof of the prob- 
ability one convergence is given. Also, the probability one version is extended directly 
to the multi-dimensional case with absolute values of real numbers replaced by lengths of 
vectors. The one-dimensional theorem is a consequence of the following easily proved 
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lemma. Jf tan{, [Ont, fen}, [6,{ and {&,} are sequences of real numbers satisfying the following 


conditions: (i) tan}, tbat, fen} are positive, (ii) {&} are non-negative, (iii) lim a, = 0, 
Don < ©, Den = ©, Din < @, (iv) &a41 S max (an, (1 + ba)En + 82 — Cn) forall n 
greater than some N, then lim &, = 0. The multi-dimensional theorem follows from a slightly 


modified version of the above lemma. (Received July 7, 195s.) 


81. A Classification Problem. Oscan Wesier, University of Michigan. 


The following version of “the problem of the k-faced die” is considered: Nature’s pure 
strategies make up two sets of states, 2; consisting of the k! states got by permuting a known 


probability distribution p = (pi, pz, °°: , ps) over the faces 1, 2, --- , k of a k-faced die 
Q consisting similarly of the &! states arising from a known distribution gq 

(gi, 92. °°* . Qe). Classification is made on the basis of N observations given by the suff 
cient statistic r = (rm), r2,°** , 7.) representing the number of times each face appears 


Let ¢ be a randomized statistical decision procedure, and let a(¢) and 8(¢) be the maxima 
of the probabilities of errors of the first and second kind, respectively. Then we wish to 
minimize 8(¢) subject to alg) = ay). The class of unique symmetric procedures ¢* optimal 
in this extended Neyman-Pearson sense is found by a game-theoretic, minimax method, 
and from the invariance of the problem under the symmetric group of permutations on / 
letters. A simplification is given for large N, in which the ¢* are replaced by kaleidoscopic 
tests, determined by a one-parameter family of hyperplanes and their symmetric images 
Finally, it is shown that, for & = 2, the g* and the kaleidoscopic approximations are in 
exact agreement for every \V. (Received July 7, 1958.) 


82. Generalization of Palm’s Loss Formula for Telephone Traffic. V. fh. Benes, 
Bell Telephone Laboratories, Ine. 


Let F be a real non-negative function on a space X, let F be a Borel field of X-subsets 
and let & ,4& = 0,1, 2, --- bea stationary Markov process taking values in X, with transi 
tion function p(é, A) for € in XY and A in ¥. We interpret the numbers F(£;,) as the inter 
arrival times of telephone calls at a trunk group. There are \ trunks, lost calls are cleared 
and holdingtimes of trunks are independent, with a negative exponential distribution of 


mean, 4 We prove the following result: If P is the stationary probability measure of 

& , then the chance of loss (of finding all N trunks busy) is [= ( A AP (X 
a 

with Ay = Jand A, = Kyl Ky) K,4:{1 — Ky4il-', where K,, is the operator whose 

action on a measure u» is defined by A, w(A) = fxfa exp |—nyF(£){ p(n, dt)u(dn). Palm’s 

formula applies to the case XY = (0, x), F(é) = &, & independent. Our formula has the same 


algebraic form as Palm’s, but the multiplicative constants have been replaced by operators 
The inverses indicated in our formula exist under weak hypotheses. (Received July 14, 1958 


83. Factorial Analysis of Life-Tests. Marvin Zevex, National Bureau of 
Standards. 


Consider a factorial experiment involving the factors A and B having levels a and 6 
respectively. Let a life-test experiment be planned such that ” items are tested for each of 
the ab factorial combinations and the test is terminated when exactly r (r S n) of the test 
items have failed. Assume that the underlying distribution of failures for the (7,7) factorial 
combination (¢ = 1, 2,--- ,a;j = 1,2,--- ,b) is p(zi;) = a} exp |[— (2%; A 6} | for 
xij 2 Aj; , where 6;; = ma;bje;; . Maximum likelihood estimates are found for the param- 
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eters m,a; ,b, , and c,; . Likelihood ratio tests are given for testing various hypotheses for 
these parameters as well as approximations for the small sample distribution of these tests. - 
(Received July 14, 1958.) 


84. Unbiased Estimation for Functions of Location and Scale Parameters. 
R. F. Tate. 


Integral transform theory is employed to obtain unbiased estimators (which in many 
cases have the minimum variance property) for functions of a location parameter @ and/or 
a scale parameter o. Applications are made to the gamma distribution with parameters 
considered together and separately, and to truncated distributions in general. A simple 
formula is presented for estimating any differentiable function of a single location param 
eter of truncation; no calculation of distributions or conditional expectations is required 
in order to find a minimum variance unbiased estimator. Special attention is paid through- 
out the paper to the estimation of the functions P(X ¢ A | 6), P(X e€ A |e), and 


P(X ¢€ A | 6,0), where A is an arbitrary Borel set. (Received July 21, 1958; revised July 
25, 1958 


85. Theory of Successive Two-Stage Sampling. (Preliminary report) B. D. 
TIKKIWAL (By title) 


The general theory of Univariate Sampling on Successive Occasions have been studied 
by the author [J. Ind. Soc. Agric. Stat., Vol. 8 (1956), pp 84-90] under a specified sampling 
scheme and correlation pattern. Here the sampling units selected for study on various oe 
casions are completely enumerated. The present paper gives the best estimator and its 
variance under the same sampling scheme when each of the primary units (assumed of the 
same size M) are not completely enumerated but observed only on a sub-sample of size m. 
It is shown that the form of the best estimator is the same as in the univariate case, when 
the pattern of correlation is the same at both the stages. It is further noted, that, for an 
infinite population and Ws= x, the variance of the best estimator on the Ath occasion is 
given by ¢/ns. V in the notations of the above paper and where V is the variance of the 
simple two stage sampling mean when only one primary unit is selected on the Ath occasion. 
(Received August 1, 1958.) 


86. Functions of Markov Chains (Preliminary Report); Murray Rosensuarr, 
Indiana University. 


Let X,. = 0,1, «+» be a Markov Chain with initial distribution w; = P|X, = i] and 
stationary transition probability matrix P = (p;;) i,j = 1,2 --- . Let Y, = f(X,) and let 
S, ,a = 1,2,---, be the sets of states of X,, that f collapses into states of Y, . Let class one 


consist of those sets of states into which one has access with positive probability from at 
most one set of states. Class two is the complementary class of states. A necessary and 
sufficient condition that Y, be Markovian (for a fixed f), whatever the initial distribution 
w,; of the X, process, is given as follows: (i) If S, belongs to class two, D pi; p; 5, ™ Pi.s, 


JéSa 
Cs_.s, for all 7, 8. Here p, i > pj. (ii) Given any sequence of sets of states S; , S:, 
J&Sq 


- , S, where S, is of class two and S,,--- , Sy: of class one, Pij" Pj.s, = Pi.s, 


jeSy_1 
eg’-'). for all cif there is positive probability of the path S, — S,— --- -S, 
September 8, 1958.) 


. (Received 
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NEWS AND NOTICES 


Readers are invited to submit to the Secretary of The Institute news items of interest 


Personal Items 


Dr. Churchill Eisenhart has been granted a Rockefeller award “in recognition 
of outstanding public service,” and will spend the coming academic year in 
England, engaged in research. Dr. Eisenhart, chief of the Statistical Engineering 
Laboratory, which he organized in 1946, will be based at the Research Tech- 
niques Unit of the London School of Economics and Political Science, where he 
will continue preparation of material for a unified treatment of the fundamentals 
of measurement theory and practices as related to the needs of the biological, 
physical, social and behavioral sciences. 

Ira G. Spicer, formerly Project Leader of Technical Analysis at Minneapolis 
Honeywell, has taken a position as Research Engineer with the Lockheed 
Missile Systems Division in Sunnyvale, California. 

In August, Nelson M. Blachman will take a two-year leave of absence from 
his job at the Sylvania Electronic Defense Laboratory, Mountain View, Cali- 
fornia, to become a Scientific Liaison Officer at the Branch Office of the U.S 
Office of Naval Research in London, England, where he will carry on liaison 
with European scientists in the field of electronics. 

Alan T. James of the Division of Mathematical Statistics, Commonwealth 
Scientific and Industrial Research Organization, Australia, will be a Visiting 
Lecturer at Yale University for the academic year 1958-1959. 

Alfred Lieberman, formerly with the Bureau of Ships, has now joined the 
staff of the Institute for Defense Analyses, Washington, D. C. 

Herman Wold has accepted an invitation to serve as Visiting Professor during 
the academic year 1958-1959 at Columbia University, Economics Department, 
New York. 

Margaret P. Martin has taken the position of Associate Professor of Preven 
tive Medicine (Biostatistics) at the Upstate Medical Center of the State Uni 
versity of New York at Syracuse. She formerly held a similar position at Vander- 
bilt University. 

Hian Liang Ang, Drs. Math. completed his work for his Master’s degree in 
Statistics at the University of California at Berkeley in October, 1957, and con- 
tinued his work toward a Ph.D. degree. He goes back to Indonesia in August, 
1958, to resume his post as Lecturer of Mathematics at the University of Indo 
nesia at Bandung. 

Mr. Ulysses V. Ward was appointed Instructor of Mathematics at Howard 
University in September, 1957. 

John FE. Freund has recently been appointed Chairman of the Department of 
Mathematics of Arizona State College at Tempe (soon to be called Arizona 
State University.) 

Patrick Billingsley has accepted a position as Assistant Professor in the 
Department of Statistics of the University of Chicago. 
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John W. Morse has resigned from position as Head of Economics at Keuka 
College, New York, to teach Statistics full-time as Assistant Professor, Eco- 
nomics at Hobart and- William Smith Colleges, Geneva, New York, doing 
statistical consulting and developing inventions. 

John W. Mayne, Director, Operational Research, Royal Canadian Navy has 
been posted by the Defense Research Board to the SHAPE Air Defense Technical 
Centre, The Hague, Netherlands, to be Chief of an Operational Research Sec- 
tion. He expects to be in Europe for about three years. 

William E. Jaynes has accepted a position as an Assistant Professor of In- 
dustrial Psychology and Statistics, and Director of the Bureau of Industrial 
Testing and Institutional Research at the University of Omaha in Omaha, 
Nebraska. 

Charles T. Lewis has recently accepted a position as operations analyst in 
the Operations Research Group at Convair in Fort Worth, Texas. 

R. E. Beckwith has accepted a position as Senior Research Engineer with the 
California Institute of Technology Jet Propulsion Laboratory. 

Frances Campbell Amemiya has resigned her position as Chairman of the 
Department and Professor of Mathematics at George Pepperdine College in 
Los Angeles. She is now Associate Professor of Mathematics at the California 
Western University. 

Dr. John E. Walsh, formerly with the Military Operations Research Division 
of Lockheed Aircraft Corporation is now with the Operations Research Group 
of the System Development Corporation, 2400 Colorado Avenue, Santa Monica, 
California. 

H. W. G. Deeks has been appointed Statistician in the War Office, Whitehall, 
London, 8.W.1. 

Dr. Om P. Aggarwal has returned to Purdue University as Associate Professor 
after spending a year at the University of Alberta (1956-57) as Visiting Associ- 
ate Professor and another at the University of Saskatchewan. While in Canada, 
Professor Aggarwal was also a Fellow at the Summer Research Institute of the 
Canadian Mathematical Congress which is held every summer at Queen’s 
University, Kingston, Ontario. 

Ingram Olkin, on sabbatical leave from Michigan State University, will be at 
Stanford University for the academic year 1958-1959. 

J. E. Morton is serving as Statistical Adviser to the UN Economic Commis- 
sion for Asia and the Far East in Bangkok, Thailand; he is also giving a course 
at Chulaloregkorn University in Bangkok on Linear Programming. 

On July 2, 1958, Alan H. Gepfert became Director of Statistical Research of 
the Chicago and North Western Railway Company. The major present job is 
to develop economic models by which to forecast revenues. Also concerned with 
application of sampling and regression analyses to cost-finding and general 
corporate planning. Mr. Gepfert was formerly a member of operations research 
group and faculty of Case Institute of Technology. 

The Data Processing Division of International Business Machines Corpora- 
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tion completed its move to Westchester during July. Its new address is Inter- 
national Business Machines Corporation, Data Processing Division, 112 East 
Post Road, White Plains, New York. 

Professor Herbert Solomon, Teachers College, Columbia University, is spend- 
ing a sabbatiéal year at Stanford at Berkeley. His mailing address is Statistics 
Dept., Stanford University, Stanford, California. 


RR 


Fellowship and Research Opportunities, National Academy of Sciences 
National Research Council, Division of Mathematics 


The Division of Mathematics calls attention to the fact that several foundations and 
offices offer financial support for research in mathematics during the vear 1959-60. A number 
of fellowships will be made available, as well as opportunities for mathematicians to engage 
in basie research. A partial list, with comments, is given below. 

1. National Science Foundation. The National Science Foundation 
fellowship programs in the sciences, including mathematics 

Predoctoral fellowships are awarded annually at the First Year, 


Sponsors Various 


Intermednite and 
Terminal Year levels of graduate study. Applications for 1959-1960 will be available in 
October 1958 from the National Academy of Sciences-National Research Council until 
the closing date in early January 1959; Award date—March 16, 1959 

Science Faculty fellowships for college science teachers (including mathematics) who 
plan to continue teaching and wish to increase their competence as teachers are at the 
present time offered semi-annually. Eligibility requirements include a baccalaureate degree 
and three (3) vears of full-time experience in teaching natural science subjects at the col 
legiate level. Awarded annually. The program will be open from May to October. Awards 
will be announced in early December. Address all inquiries for information and applications 
to National Science Foundation, Division of Scientifie Personnel and I/duecation, Washing 
ton 25, D. C. 

Postdoctoral fellowships (in making inquiry about postdoctoral awards specify program 

(1) Regular postdoctoral fellowships—primarily for recipients of the doctoral degree ; 
awarded semi-annually. Program for 1959-60 concurrent with predoctoral program (see 
above) except that program closes in December. Information and applications will be avail 
able from the NAS-NRC. The program will also be open from July to early September 1959 
Awards are announced in March and October. 

(2) Senior postdoctoral fellowships— are open to persons who have held a doctoral 
degree in one of the basie fields of science for a minimum of five (5) vears at time of appli 
cation, or who have had equivalent training and experience. Awarded annually. Applica 
tions are available from the National Science Foundation, Division of Scientifie Personnel 
and Education, Washington 25, D. C. The program will be open from May to October 
Awards will be announced in early December: 

Research Grants. The National Science Foundation also supports basic research in the 
mathematical sciences by means of grants. While proposals for such support are accepted 
at any time, individuals desiring support to begin in the summer or at the beginning of a 
fall semester should preferably submit their proposals in the mathematical sciences by 
November 1; persons desiring support to begin in the spring semester should preferably 
submit their proposals by May 1. Instructions for the preparation of proposals, contained 
in a booklet entitled Grants for Scientific Research, may be obtained upon request from 
the Program Director for Mathematical Sciences, National Science Foundation, Washing- 
ton 25, D.C. 
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2. Office of Naval Research. The Office of Naval Research, through contracts with uni- 
versities and other organizations, supports basic research in broadly selected fields of 
mathematics. Proposals should be directed to the Mathematics Branch, Office of Naval 
Research, Washington 25, D. C. In addition, postdoctoral research associateships in pure 
mathematics are being established under contracts with the ONR at selected universities. 
For details and application forms write to the above address. 

3. Air Force Office of Scientific Research. The Air Force Office of Scientific Research sup- 
ports research in mathematics directly through contracts with colleges, universities, 
foundations and industrial laboratories. Such organizations are encouraged to submit 
proposals for research in mathematical fields in which they specialize. Proposals should be 
mailed to the Commander, Air Force Office of Scientific Research, Attn: Mathematics 
Division, Washington, 25, D. C. 

4. Office of Ordnance Research, U.S. Army. Among the functions of the Office of Ordnance 
Research is the support of basic research in mathematics. Proposals for projects are ordi 
narily made by individual scientists or groups of scientists in a form which leads to a con 
tract between the Office of Ordnance Research and a university or research laboratory 
For further information write to Commanding Officer, Office of Ordnance Research, Box 
CM, Duke Station, Durham, North Carolina. 

5. Fulbright Awards—Public Law 584 (79th Congress). Approximately 400 awards are 
offered annually for university lecturing and postdoctoral research in all academic fields 
in Argentina, Australia, Brazil, Burma, Chile, Colombia, Ecuador, India, New Zealand, 
Pakistan, Paraguay, Peru, the Philippines and Thailand (competition for the preceding 
countries closes April 15, 1959); Austria, Belgium-Luxembourg, Republic of China, Den 
mark, Finland, France, Germany, Greece, Iceland, Iran, Ireland, Israel, Italy, Japan, 
the Netherlands, Norway, Turkey, and the United Kingdom including colonial depend 
encies (competition for the latter countries closes October 1, 1959). In both cases awards 
are for the academic vear 1960-61 (the 1959-60 competition for Europe closes October 1 
1958), but in the former group of countries the academic year begins in the spring or summer 
instead of the autum. Awards are payable in foreign currency and usually include travel 
for the grantee, but not for members of his family, and a maintenance allowance, which 
may be adjusted in relation to the number of accompanying dependents up to four. Re 
quests for information should be addressed to the Committee on International Exchange 
of Persons, Conference Board of Associated Research Councils, 2101 Constitution Avenue, 
Washington 25, D.C. 

6. National Bureau of Standards. Naval Research Laboratory. Air Research and Develop 
ment Command. Postdoctoral resident research associateships are available in a variety of 
sciences including mathematics and are tenable at the Washington, D. C. and Boulder, 
Colorado laboratories of the National Bureau of Standards; at the Naval Research Labora 
tory in Washington, D. C.; and at selected development and research centers of the Air 
Research and Development Command. Necessary facilities and equipment incident to the 
research of the associate will be provided. For further information write to Fellowship 
Office, National Academy of Sciences-National Research Council, 2101 Constitution Ave 
nue, Washington 25, D. C. Applications for the 1959-60 program must be filed on or before 
January 19, 1959. 

7. Atomic Energy Commission. The Division of Research of the Atomic Energy Com 
mission through contracts with universities and other organizations supports research in 
the fields of numerical analysis, digital computer design, programming research, and re 
lated topics. Proposals should be submitted to the Division of Research, Atomic Energy 
Commission, Washington 25, D. C. 

Brookhaven National Laboratory. Brookhaven National Laboratory, operated by As 
sociated Universities, Inc. under contract with the Atomic Energy Commission offers 
postdoctoral research appointments in mathematics. Appointments are for one year, and 
may be renewed for one additional vear. U.S. citizenship is not required, although Atomic 
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Energy Commission approval is a prerequisite. The appointee may work in numerical analy 
sis, digital computing, mathematical physics, differential equations, probability and 
statisties, and various specialized branches including reactor theory, hydrodynamics, and 
orbit theory. Computational facilities are available. Letters from candidates should give 
details of personal history, scientific background, and qualifications; two letters of recom- 
mendation, one from the applicant’s research professor, are required. Applications should 
be directed to M. E. Rose, Head, Applied Mathematics Division, Brookhaven National 
Laboratory, Upton, Long Island, New York. 


September 1, 1958 
S.S. Witks, Chairman 
Division of Mathematics 
M. H. Martin, Executive Secretary 
Division of Mathematics 


a 


Committee on Statistics 


A new committee, the Committee on Statistics, has been established in the 
Division of Mathematics, NAS-NRC. It has been established in the Division as 
the successor to the Committee on Applied Mathematical Statistics which was 
appointed in 1942 and placed directly under the Acidemy-Research Council 
Governing Board. The funds in the custody of the earlier committee, and 
amounting to approximately $5,000, have been transferred to the custody of the 
new committee. 


Oe 


Cost-Free Digital Computer Time 


As announced in the March, 1958, Annals, pp. 343-547, the Committee on 
Mathematical Tables of the Institute of Mathematical Statistics has made a 
survey of cost-free time on digital computers in the United States. The survey, 
in which 171 digital computer installations were queried, is now complete. Cost- 
free time is available at approximately 40 installations in at least 18 states in 
ull parts of the United States. Members of the Institute of Mathematical Sta- 
tistics who wish to avail themselves of some of this cost-free time to compute 
on a problem of general interest, i.e., a problem which might lead to publication 
of results in a professional journal, are invited to get in touch with the Chairman 
of the Subcommittee on Cost-Free Machine Time, Professor Fred C. Leone, 
Statistical Laboratory, Case Institute of Technology, Cleveland 6, Ohio. Advice 
on the preparation of specific tables (but not advice on programming or numerical 
analysis) is available from the other subcommittees and the reader is referred 
to the March, 1958, Annals for a complete listing of them. 

D. B. Owen, Chairman 
Committee on Mathematical Tables 
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Preliminary Actuarial Examinations Prize Awards 


The winners of the prize awards offered by the Society of Actuaries td the nine 
undergraduates ranking highest on the score of Part 2 of the 1958 Preliminary 
Actuarial Examination are as follows: 

First Prize of $200: Daniel G. Quillen, Harvard University. 

Additional Prizes of $100 each: Edward J. Barbeau, Jr., Toronto University ; 
William H. Blake, Jr., George Washington University; Theodore M. Jungreis, 
Rensselaer Polytechnic Institute; David H. Krantz, Yale University; Joe 
Lipman, Toronto University; Dennis W. Moore, Harvard University; Theodore 
S. Rosky, State University of lowa; Lawrence A. Shepp, Brooklyn Polytechnic 
Institute. 

The Society of Actuaries has authorized a similar set of nine prizes for the 
1959 examinations on Part 2. 

The Preliminary Actuarial Examinations consist of the following three exami- 
nations: Part 1. Language Aptitude Examination. (Reading comprehension, 
meaning of words and word relationships, antonyms, and verbal reasoning.) 
Part 2. General Mathematics Examination. (Algebra, trigonometry, coordinate 
geometry, differential and integral caleulus.) Part 3. Special Mathematics 
Kxamination. (Probability and statistics.) 

The 1959 Preliminary Actuarial Examinations will be prepared by the Edu- 
cational Testing Service under the direction of a committee of actuaries and 
mathematicians and will be administered by the Society of Actuaries at centers 
throughout the United States and Canada on May 13, 1959. The closing date 
for applications is April 1, 1959. 


rr 


Postdoctoral Study in Statistics 


Awards for study in statistics by persons whose primary field is not statisties 
but one of the physical, biological, or social sciences to which statistics can be 
applied are offered by the Department of Statistics of the University of Chicago. 
The awards range from $3,600 to $5,000 on the basis of an eleven month resi- 
dence. The closing date for application for the academic year 1959-60 is Febru- 
ary 16, 1959. Further information may be obtained from the Department of 
Statistics, Eckhart Hall, University of Chicago, Chicago 37, Ilinois. 


rr 


Nonparametric Statistics 


A revision is being made of “Bibliography of Nonparametric Statistics and 
Related Topics,” Journal of the American Statistical Association 48 (1953 
pp. 844-906. Material through 1959 is to be included with more emphasis, it is 
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hoped, on applications than previously. References (particularly to the non- 
Inglish literature), reprints, and technical reports on the theory or applications 
of nonparametric statistics would be greatly appreciated. Also, corrections and 
additions to the original bibliography are desired. 


Os 


University of Michigan Summer Program in Health Statistics 


The School of Public Health, University of Michigan, will have a summer 
program in health statistics, June 18 through August 1, 1959. The faculty will 
be assembled from many of the schools of public health, and from the ranks of 
leading workers in the field of statistics in the health sciences. Tentative course 
titles are: Statistical Methods in Publie Health, Management of Health Agency 
fecords, Registration and Vital Statistics, Biostatistics in the Health Sciences, 
Demographic Methods in Public Health, Statistical Methods in Epidemiology, 
Sampling Techniques in the Health Sciences, Advanced Biostatistics in the 
Health Sciences, Statistical Methods in Biological Assay. 

further information can be obtained from F. D. Hemphill, School of Public 
Health, University of Michigan, Ann Arbor, Michigan. 


rE 


Additional Doctoral Dissertations 


The following doctorates, conferred in 1957, should be added to the list 
published in the June, 1958 issue of these Annals. 


Kupperman, Morton, The George Washington University, major in mathematical sta 
tistics, ‘Further Applications of Information Theory to Multivariate Analysis and Sta 
tistical Inference.”’ 

McCall, Chester H., Jr., The George Washington University, major in mathematical 
statisties, “The Linear Hypothesis, Information, and the Analysis of Variance.”’ 

NaNagara, Prasert, Cornell, major in statistics, ‘Lattice Rectangle Designs.’ 


Seana 


New Members 


The following persons have been elected to membership in The Institute 
Muay 14, 1958, to June 23, 195s 


Allen, (Rev.) Raymond W., Ph.D. (St. Louis University), Chairman, Department of Mathe 
matics, Xavier University, Cincinnati 7, Ohio. 

Amster, Sigmund J., M.S. (Columbia University), Student, University of North Carolina, 
119 Harvey Street, Philadelphia, Pa. 

Burton, Ellison Stanley, B.A. (Amherst College), Systems Mngineer (Statistician) Con 
sultant, Harper Engineering Company, Santa Monica, California, 4650 East 19th Street, 
Tucson, Arizona, 
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Cook, William H., A.B. (Hofstra College), Mathematical Statistician, U.S. Bureau of the 
Census, Statistical Research Division, Washington 26, D.C. 

Edwards, Bernard, B.Sc. (London), Lecturer in Statistics, Municipal College of Commerce, 
College Street, Newcastle Upon Tyne, Northumberland, England. 

Friedman, Morton Philbert, M.A. (Ohio State University), Student, Ohio State University, 
Columbus, Ohio, 865 Northwest Blvd., Columbus 12, Ohio. 

Goodman, Arnold F., B.S. (N.C. State College), Graduate Assistant, Stanford University, 
Stanford, California, 621 Harvard Avenue, Menlo Park, California. 

Hakim, Muhammad A., M.Sc. (Univ. of Calcutta) Graduate Student, University of Cali- 
fornia, Department of Statistics, Berkeley 4, California 

Harkness, William L., M.A. (Michigan State University) Special Graduate Research 
Assistant, Department of Statistics, Michigan State University, East Lansing, Michigan. 

Higa, Seiko, B.A., (Pacific University) Statistician, Finance Department, U.S. Civil Ad- 
ministration of the Ryukyu Islands, Naha, Okinawa, Ryukyu Islands. 

Hill, Bruce M., M.S. (Stanford University), Assistant in Statistics, Graduate Student, 
Stanford Statistics Department, Stanford University, Stanford, California. 

Johnson, Whitney Larsen, M.S. (University of Minnesota), Instructor, Dept. of Math. 
Institute of Technology, University of Minnesota, Minneapolis 14, Minn. 2175.1 Folwell, 
St. Paul 8, Minn 

Kaller, Cecil, M.A. (Saskatchewan), Research Fellow, Purdue University, Lafavette, 
Indiana, Statistical Laboratory, Purdue University, Lafayette, Indiana. 

Masuyama, Motosaburo, (Joctor of Science), Chief of the Laboratory of Environmental 
Hygiene, Meteorological Research Institute, Tokyo, Institute of Physical Therapy & 
Internal Medicine, Faculty of Medicine, Tokyo University, Bunkyo-ku, Tokyo, Japan. 

Middleton, David, Ph.l). (Howard University), Consulting: Physicist, 23 Park Lane 
cord, Mass 

Mills, Harlan Duncan, ’h.|). (lowa State) Research Associate, 
Princeton, New Jersey, 186 Elm Road, Princeton, New Jersey 

Nemenyi, Peter B., M.A. (Princeton University), Assistant Statistical Analyst, Metro 
politan Life Insurance Company, Madison Avenue, New York 10, N. Y.; also Lecturer, 
Hunter College, Pk. Ave. and 68th Street, New York 

Niedzielski, Edmunk L., ’h.D. (Fordham University), Research Chemist, FE. 7. DuPont 
De Nemours and Co., Petroleum Laboratory, P.O. Box 1671, Wilmington 99, Del 

Okamoto, Masashi, M.S. (Tokyo University), Lecturer in Mathematical Statisties, Osaka 
University, Japan, Nakanoshim, Kita-ku, Osaka, Japan 

Pincus, Louis, B.8., (The City College of New York), Senior Statistician, New York City 
Department of Health, 125 Worth Street, New York 12. N. Y., 451 Kingston Avenue 
B - Brooklyn 25. Vew York 

Randels, Robert B., Ph.I). (University of Michigan) Physicist, Corning Glass Works, 
Houghton Park, Corning, New York 

Robinson, Enders A., Ph.1)., (Massachusetts Institute of Technology), Assistant Professor 
of Statistics, Michigan State University, Department of Statistics Michigan State Uni 
versity, East Lansing, Michigan 

Sato, Sokuro, (Tokyo Technical College) Assistant Professor, Faculty of Education, 
Saga University, Saga City, Saga Prefecture, Japan, Dokkoo-koji, Mizugaemachi, 
Saga City, Saga City, Japan 

Slud, Maurice H., M.A. (Columbia University), Mathematical Analvst, General Electric 
Company, Missile and Ordnance Systems Department, 3198 Chestnut Street, Philadelphia 4, 
Pennsylvania. 

Smith, William Roger, M.S. (University of Wisconsin), Student, University of California, 
§345 Zara Avenue, Richmond, California. 

Tsutakawa, Robert K., M.S. (University of Chicago), Quality Analyst A, Quality Control 


. Con 


Princeton University, 
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Department, Pilotless Aireraft Div., Boeing Airplane Co., Seattle, Washington, 11034 
38rd Avenue, Seattle, Washington. 

Welch, Peter D., M.S. (University of Wisconsin), Staff Engineer, JBM Research Center, 
Yorktown Heights, New York. 

Zacks, Shelemyahu, B.A. (Hebrew University), Statistician of the Building Research Sta 
tion, Technion, The Technion Research Institute, Haifa, Israel. 

Zehna, Peter W., M.A. (University of Kansas) Research Assistant, Stanford University 
Applied Mathematics and Statistics Laboraiory, Stanford University, Stanford, California 


il 


REPORT OF THE CAMBRIDGE, MASSACHUSETTS MEETING OF 
THE INSTITUTE OF MATHEMATICAL STATISTICS 


The seventy-eighth meeting of The Institute of Mathematical Statistics and 
the twenty-first annual meeting was held at the Massachusetts Institute of 
Technology, Cambridge, Massac -usetts, on August 25-28, 1958, in conjunction 
with the meetings of the American Mathematical Society, the Mathematical 
Association of America, the Society for Industrial and Applied Mathematics, 
and the Econometric Society. 

The program of the meeting was as follows: 


MONDAY, AUGUST 25, 1958 


9:00 A.M. Invited Papers on Regression and Analysis of Variance 


Chairman: Frankuin A. Graypitt, Oklahoma State University 
1. Variance Component Analysis in Models Where Effects Are Time Variables, A. W 
Worrtnam, (presented by Leroy Fouks) Texas Instruments Ine., Dallas 
2. Industrial Experience with 2°-4¢ Fractional Factorial Experiments, CUTHBERT DANIEL 
New York City 
3. Confidence and Significance Procedures for Non-linear Models, M. B. Witx, Bell 
Telephone Laboratories, Murray Hill 


11:15 A.M. Wald Lecture I 


Chairman: J. L. Hopaes, Jr., University of California, Berkeley 
The Mathematical Basis of Fiducial Inference, soun W. Tukey, Princeton Universit 


2:00 P.M. Invited Papers on Estimation and Testing 


Chairman: DonaLp L. BURKHOLDER, Usiiversity of Illinois 
1. Power of the Chi-square Test, J. L. Hopces, Jr., University of California, Berkeley 
2. On Solutions of Dorfman’s Mass-Testing Problem, Micron Soper, Bell Telephone 
Laboratories, Allentown 

3. Linear Regression in the Multivariate Normal Case, CHARLES Stein, University of 


California, Berkeley 


4:00 P.M. Invited Papers on Testing 


Chairman: D. B. Owen, Sandia Corporation 
1. Partial Orderings of Probabilities of Rank Orders, 1. Richard SavaGe, University 
of Minnesota 
2. Simple Methods for Analysis of Two-Action Problems with Linear Costs, Roper 
ScHuLAIFER, Harvard University @ 


8:00 P.M. 1958 Council Meeting 
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TUESDAY, AUGUST 26, 1958 
9:00 A.M. Contributed Papers, I. (Simultaneous with Contributed Papers II) 


Chairman: CHARLES Stein, University of California, Berkeley 
1. Truncation and Tests of Hypotheses, Om P. AGGARWAL, Purdue University aNnp 
Irwin GuttMan, Princeton University (by title) 
s 2. Admissible Estimates and Maximum Likelihood Estimates (A Sketch of a Unified 
Theory of Estimation), ALLAN Birnpaum, Columbia University 
3. A Note on Estimating Translation and Scalar Parameters, Joseru A. Dusay, Uni 
versity of Oregon 
4. A Convolutive Class of Monotone Likelihood Ratio Families, 8. G. GuuryE AND 
Davip L. Watuace, University of Chicago 
5. Power and Control of Size of Some Optimal Welch-type Statistics, RoGer S. Me 
CULLOUGH AND JOHN GURLAND, Iowa State College 
6. Some Population Estimation Models and Related Limit Distributions, RONALD PYKE 
AND N. DoNALD YLVISAKER, Stanford University 
7. On the Choice of Sample Size in the Kolmogorov-Smirnov Tests, JupbAn ROSENBLATT, 
Purdue University 
8. Theory of Successive Two-Stage Sampling, (preliminary report) B. D. Tikkiwat, 
Karnatak University, Dharwar, India (by title) 
%. On the Existence of Wald’s Sequential Test, Roperr A. Wissman, University of 
Illinois (by title 
10. Functions of Markov Chains, (preliminary report) Murray Rosensvarr, Indiana 
University (by title 


9:00 A.M. Contributed Papers, II 


Chairman: Jack NADLER 
1. Confidence Bounds Associated with a Test for Symmetry, R. GNANADESIKAN, The 
Procter & Gamble Company (by title) 

2. Determining Bounds on Integrals with Applications to Cataloging Problems, BERNARD 
Harris, George Washington University (by title) 

3. On the Exact Joint Distribution of the First Two Serial Correlation Coefficients, 
V. WK. Murtuy, University of North Carolina 

t. A Classification Problem Involving Multinomials, Oscan Wes.er, University of 
Michigan 

5. Applications of a Certain Representation of the Wishart Matrirz, Roper A. WiJsMAN, 
University of Hlinois 

6. On the Problem of Incomplete Data, JUNJTRO OGAWA AND BERNARD 8. PASTERNACK, 
Univecsity of North Carolina 

7. Unbiased Estimation for Functions of Location and Scale Parameters, R. F. Tare, 
University of Washington (by title) 

8. Uniqueness of the Ly Association Scheme, 8. S. SurtkHanpve, University of North 
Carolina 

9%. On the Asymptotic Minimax Character of the Sample d.f. of Vector Chance Variables, 
J. Kierer anp J. WoLrowrrz, Cornell University, (by title) 


11:15 A.M. Wald Lecture II 


Chairman: Wittram H. Kruskat, University of Chicago 
The Mathematical Basis of Fiducial Inference, (continued) Joun W. Tuxey, Prince 
ton University 


2:00 P.M. Invited Papers on Probability and Stochastic Processes, I 
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Chairman: J. R. Buum, University of Indiana 
1. A Moment-Problem with Restriction on Smoothness, ( L. Mattows, Princeton 
University 
2. About the Central Limit Problem, Micue. Love, University of California, Berkeley 
3. Hausdorff Dimension and Information Theory, Parrick BILLINGSLEY, Princeton 
University and University of Chicago 
t. .1 Geometry of Binary Sequences Associated with a Class of Error-Correcting Codes, 


Roy R. Kuescier, Jr., University of North Carolina 


4:00 P.M. Special Invited Address 


Chairman: SHanti 8S. Gueva, University of Alberta 
Vultiple Decision Selection P ocedures, MILTON SOBEI sell Telephone Labora 
tories, Allentown 


WEDNESDAY, AUGUST 27, 1958 
9:00 A.M. Invited Papers on Sequential Analysis ‘ 


Chairman: KENNETH J. ARNOLD, Michigan State University 
1. A Modification of Sequential Analysis to Reduce the Sample Size, T. W. ANveERSON, 
Center for Advanced Study in Behavioral Sciences and Columbia University 
2. Binomial Sequential Testing, Coun R. Biyru, stanford University 
3. Unbiased Sequential Estimation for Binomial Populations, Morris H. DeGroor, 
Carnegie Institute of Technology 


11:15 A.M. Wald Lecture III 


Chairman: Marvin ZELEN, National Bureau of Standards 
The Interpretation of Fiducial Inference, Joun W. Tukey, Princeton University 


2:00 P.M. Invited Papers on Probability and Stochastic Processes II. (Simul- 
taneous with Invited Papers on Random Balance.) 


Chairman: Max Woopspury, New York University 
1. Semigroups of Operators and Stochastic Processes, A. V. BALAKRISHNAN, University 
of California, Los Angeles 
2. Independent Polynomials in Normal Variates, R. G. Laua, Catholic University of 
America and Columbia University 
3. On Multi-event Renewal Processes, RONALD Pyk#, Stanford University 


2:00 P.M. Invited Papers on Random Balance 


Chairman: FRANK J. ANSCOMBE, Princeton University 

1. Introductory Remarks, FRaNK J. ANSCGMBE, Princeton University 

2. On the Analysis of Screening Experiments, . M. L. Beate, Princeton University, 
inp C. L. Mauiows, Princeton University 

3. Analysis Methods for Randomly Balanced Factorial Desicns, A. P. Dempster, Bell 
Telephone Laboratories and Harvard University 

+. Wathematical Outline of Polyvariable Analysis (including Random Baiance), ¥. 1. 
SATTERTHWAITE, Statistical Mngineering Institute, Wellesley Hills 


4:00 P.M. Wald Lecture IV 


Chairman: M. B. Wiik, Bell Telephone Laboratories 
What Importance Should We Place on Fiducial Inference? Joux W. Tukey, Princeton 
University 


5:30 P.M. Business Meeting 
8:00 P.M. 1959 Council Meeting 
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THURSDAY, AUGUST 28, 1958 
9:00 A.M. Contributed Papers III (Simultaneous with Contributed Papers IV) 


Chairman: Ropertr A. WiJsMAN, University of Illinois 
1. Ona Limiting Distribution Due to Renyi, D.G. Cuarman, University of Washington 
2. Stochastic Models for the Electron Multiplier Tube, Eowarp K. Datton, WILLARD 
DD. James, AND Howarp G. Tucker, University of California at Riverside 

3. On Stochastic Approximation, C. DeRMAN AND J. Sacks, Columbia University 

4. Single Server Queuing Processes with a Finite Number of Sources, GERALD HARRISON, 
The Teleregister Corporation 


1 Wodel for Failure Data and its Applications, ANDRE G. LAURENT, Wayne State 
University 

6. Generalization of Palm's Loss Formula for Telephone Traffic, V. E. Benes, Bell 
Telephone Laboratories, Murray Hill 

7. The Moments of the Maximum of Partial Sums of Independent Random Variables, 
Joun S. Wurre, Minneapolis-Honeywell Regulator Co 

8. Stochastic Models for Length of Life, BENJAMIN Evstetn, Wayne State University 
and Stanford University (By title) 

9. Tests Jor the Validity of an Exponential Distribution oj Life. BENJAMIN EpsTEIN, 

Wayne State and Stanford Universities, (by title 


9:00 A.M. Contributed Papers IV 


Chairman: M. V. Jouns, Jr., Stanford University 
1. Estimation of the Medians for Dependent Variables, OLive JEAN DUNN, University 
of California, Los Angeles 
2. The Use of Sample Quasi Ranges in Estimating Population Standard Deviation, 
H. Leon Harrer, Wright Air Development Center 
3. Order Statistics and Estimation, M. M. Rao, University of Minnesota (by title 
$4. A Note on Order Statistics and Stochastic Independence, GERALD S. Rogers, Uni 
versity of Arizona 
5. Aids for Fitting the Pearson Type III Curve by Maximum Likelihood, J. Antuur 
GREENWOOD, Town State College anp Davip DurRaNpb, Massachusetts Institute 
of Technology 
6. On the Relationship Algebra and the Association Algebra of the Partially Balanced 
Incomplete Block Design, JUNjJIRO OGAWA, University of North Carolina (by title) 
7. Factorial Analysis of Life-Tests, Marvin ZELEN, National Bureau of Standards 
. On the Analysis of Factorial Experiments without Replication, ALLAN BIRNBAUM, 


y 


Columbia University (by title) 

9. On Logistic Order Statistics, ALLAN BirRNBAuM, Columbia University (by title 

10. Statistical Theory of Some Quantal Response Models, ALLAN BiRNBAUM, Columbia 
University (by title) 

11. Optimum Designs in Regression Problems, J. Kierer anv J. Woirowrrz, Cornell 
University, (by title) 

12. On the Bounds for the Variance of Mann-Whitney Statistic, Jaapish SHARAN Rustaal, 
Michigan State University, (by title) 

13. A Characterization of Triangular Association Scheme, S. 8. SHRIKHANDE, University 
of North Carolina (by title 

14. A Problem in Two-Stage Experimentation, DoNauLD L. Ricutrer, University of North 
Carolina (by title) 


11:15 A.M. Special Invited Address 


Chairman: Samuet W. GREENHOUSE, National Institutes of Health 
Estimation Methods in Multivariate Analysis, Evan J. Wittiams, North Carolina 
State College 
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2:00 P.M. Special Invited Address 


Chairman: Morris H. HANseN, Bureau of the Census 


On a Formal Structure of Professional Practice in Sampling, W. E/pwarvs DEMING, 
New York University 


3:15 P.M. Invited Papers on Mixed Topics 


Chairman: H. A. Davin, Virginia Polytechnic Institute 
1. The Number of Occupied Cells of a Particular Subclass (When Objects are Assigned 
to Cells at Random), Howarp L. Jones, Illinois Bell Telephone Company, Chicago 
2. Statistical Theory of Tests of a Mental Ability, ALLAN Birnpaum, Columbia Uni 
versity 
3. Properties of Some Control Chart Tests for Detecting Shifts in a Process Average, 
S. W. Roserts, Bell Telephone Laboratories, New York. 


—__ 


REPORT OF THE PRESIDENT FOR 1958 


‘To the membership of the IMs 
Dear Friends: 

I am writing to you shortly before my departure from the country, because 
[ shall unfortunately not be here to address you personally at the annual meet- 
ing. 

The presidency of the Institute is an honor greater than I had expected to 
receive in my whole life. I thank you all for your expression of recognition and 
confidence. I also thank for myself, and for all of us, the many officers, com- 
mittee members, and representatives who have so loyally and competently 
handled the Institute’s increasingly complex and serious business. 

The president’s job is not among the harder ones in the Institute, but I have 
found it interesting and instructive. The office entails some decisions and of- 
fers opportunities to make suggestions throughout the Institute from a central 
vantage point. It also offers opportunities to make mistakes, and I have made 
some. Those that have thus far come to light are mostly small and rectifiable. 

Since the meeting at Atlantic City a year ago there have been regional meet- 
ings at Los Angeles, Gatlinburg, and Ames. 

T. I. Harris announced a year ago that he wanted to relinquish the Editor- 
ship at the expiration of his term on July 1, 1958. An ad hoe committee con- 
sisting of William Cochran (Chairman), T. W. Anderson, M.S. Bartlett, T. 
Kk. Harris, W. A. Wallis, and S. 8S. Wilks recommended to the Council that 
W. H. Kruskal be appointed to the [ditorship. The Council has accepted this 
recommendation and Kruskal has accepted the appointment. The Annals 
continues to grow and improve by leaps and bounds, as I expect you will hear 
in detail from the retiring Editor. 

The Council decided in Atlantic City that, to meet the rising costs of print- 
ing a larger Annals at higher rates for printing, we ought to apply to the Na- 
tional Science Foundation for a grant of money for a three-year period. I hope 
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that it will be possible to announce along with the reading of this letter that 
the grant has been made. 

The Council has before it a plan, submitted by an ad hoc committee, to ap- 
proach the National Science Foundation for aid in preparing translations of 
Russian statistical and probabilistic literature. 

A summer institute on nonparametric methods will be held at Minneapolis 
before this letter is read, and an advisory committee will report to the Council 
at this meeting on what plans, if any, should be made for a summer institute 
in 1959. 

The final duty of the president is to appoint a new nominating committee, 
and I herewith appoint: T. E. Harris, Chairman, Herman Chernoff, David 
Cox, J. C. Kiefer, and W. J. Youden. 

Deeply regretting having to take my leave thus, in absentia, I am 

Most sincerely yours, 
L. J. SAVAGE 
President 


IMS OFFICERS, COMMITTEES, AND REPRESENTATIVES 
FOR 1957-1958. 


Council Members and Officers 


Terms Expire 1958 ~ ~Terms Expire 1959 Terms Expire 1960 
R. C. Bose T. W. Anderson David Blackwell 
Churchill Eisenhart M.S. Bartlett Harold Hotelling 
Oscar Kempthorne J. Berkson Jerzy Neyman 
W. J. Youden ke. L. Lehmann I. R. Savage 

President: L. J. Savage Secretary: G. E. Nicholson 

President-elect: J. Wolfowitz Treasurer: A. H. Bowker 


editor: T. bk. Harris 


Fellows elected in 1958: Julius R. Blum, James Durbin, Benjamin Epstein, J. Hemelrijk, 
Leo Katz, Tatsuo Kawata, George E. Nicholson, Jr., Howard Raiffa, Sixto Rios, Stefan 
Vajda, Geoffrey 8. Watson, Lionel Weiss. 


Committees 


(The first person named is the chairman) 

IMS COMMITTEE ON EXCHANGES: P. 8. Dwyer. 

IMS COMMITTEE ON FELLOWS: Frank J. Anscombe, Z. W. Birnbaum, L. A. Good 
man, W. Hoeffding, Kk. S. Pearson, E. L. Scott. 

IMS FINANCE COMMITTEE: Mel Peisakoff, A. H. Bowker, Cuthbert Hurd, Theodore 
Yntema. 

IMS MEMBERSHIP COMMITTEE: Benjamin Epstein, H. E. Daniels, Meyer Dwass, 
Solomon Kulback, Sigeiti Moriguti, G. R. Seth, Rosedith Sitgreaves, Milton Terry. 

IMS COMMITTER FOR INSTITUTIONAL MEMBERS: Mervin Muller, Z. W. Birn 
baum, R. Bradford Murphy, Frank Akutowicz, K. J. Arnold, 8. S. Wilks. 

IMS COMMITTER ON PROFESSIONAL STANDARDS: Joseph Lev. 

IMS PROGRAM COMMITTEE FOR 1958 ANNUAL MEETING: W. Kruskal, F. J. Ans 
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combe, R. J. Bose, D. L. Burkholder, D. G. Chapman, Cuthbert Daniel, T. S. Ferguson, 
Evelyn Fix, F. A. Graybill, H. O. Hartley, P. J. MeCarthy, Howard Raiffa. 

IMS PROGRAM COMMITTEE OF THE CENTRAL REGION : Jack Silber, Virgil Ander- 
son, Charles Bell, H. T. David, F. A. Graybill, E. R. Immel, Bernard Ostle, M. B. Wilk 
(ex officio), W. Kruskal (ex officio), R. N. Bradt, Frank Graybill, Irving Burr, John 
Gurland, Howard Jones, Fred Leone, Boyard Rankin, Paul Rider, Jack Silbert (ex of 
ficio), Martin Wilk (ex officio). 

IMS EASTERN REGION PROGRAM COMMITTEE: Boyd Harshbarger, Ralph Brad- 
ley, B. G. Greenberg, D. G. Horvitz, Carl Kossack, Herbert A. Meyer, John Pratt, 
Dorothy Gilford (ex offieio), Martin Wilk (ex officio), W. H. Horton, J. D. Hromi. 

IMS WESTERN REGION COMMITTEE: Richard Link, Fred Andrews, Charles Bell, 
Tom Ferguson, John Gilbert, John Hofmann, Marion Sandomire, David Stoller, Robert 
Tate, J. R. Vatnsdal, Frank Massey. 

IMS COMMITTER FOR SPECIAL INVITED PAPERS: W. 35. Connor, G. Ek. P. Box, 
Kai-Lai Chung, W. Kruskal, G. E. Noether, M. B. Wilk, T. E 
(as Editor). 

IMS AD HOC COMMITTEE ON HIGH SPEED COMPUTING: A. S. Householder, G.S 
Acton, R. L. Anderson, K. J. Arnold, C. F. Kossack, W. H. Kruskal, W. J. Merrill, H. A. 
Meyer, J. Moshman, H. W. Norton, G. J. Resnikoff, R. Slimak, Z. Szatrowski, D. Teich 
roew. 

IMS SUBSCRIPTIONS COMMITTEE: Edward Coleman, Joe Adams, K. A. Bush, Lila 
Elveback, Harry Harman. 

IMS COMMITTEE ON MATHEMATICAL TABLES: D. B. Owen, G. P. Steck, Paul Cox, 
Fred C. Leone, John W. Tukey, Marvin Zelen, R. L. Anderson, A. H. Bowker, FE. E 
Cureton, W. J. Dixon, C. W. Dunnett, Churchill Eisenhart, J. A 
Hartley, William Kruskal, Daniel Teichroew, M. A. Woodbury, I 
J. Wolfowitz (ex officio). 

IMS AD HOC COMMITTER ON ANNUAL MEETING POLICY: Leo Katz, Cecil Craig 
Churchill Eisenhart, Robert Hooke, Henry Scheffe, Martin Wilk. 

AD HOC COMMITTER TO INVESTIGATE THE POSSIBILITY OF BILLING FOR 
PUBLICATION IN THE ANNALS: A. M. Mood, A. H. Bowker, John Curtiss, Dorothy 
Gilford, W. Kruskal. 

IMS BLACKBOARD COMMITTEE: Irving W. Burr, Herbert Robbins, Martin Wilk. 

IMS EDITOR HUNT COMMITTEE: W. G. Cochran, T. W. Anderson, M.S. Bartlett, 
T. EF. Harris, W. A. Wallis, S. S. Wilks. 

IMS COMMITTEE ON RUSSIAN TRANSLATIONS: Ingram Olkin, Eugene Lukacs 

COMMITTEE TO ADVISE ON A POSSIBLE SUMMER INSTITUTE FOR 1959: David 
Blackwell, H. O. Hartley, Richard Savage, David Wallace, Max Woodbury. 

IMS REPRESENTATIVE TO AAAS: Harold Hotelling. 

AMERICAN STANDARDS ASSOCIATION COMMITTEE ON STATISTICAL NOMEN 
CLATURE IMS REPRESENTATIVE FOR 1957-58: Max Halperin. 

IMS REPRESENTATIVE IN DIVISION OF MATHEMATICS NATIONAL RESEARCH 
COUNCIL: W. Allen Wallis. 

REPRESENTATIVES TO CONFERENCE ORGANIZATION OF THE MATHE: 
MATICAL SCIENCES: Joseph F. Daly for 1957-58, H. B. Mann for 1957-59. 


Harris member ex officio 


. Greenwood, H. O 
1. J. Savage (ex officio), 


RR 


REPORT OF THE SECRETARY FOR 1958 


During the past year The Institute has held its 75th through 78th meetings 
A business meeting was held during the 78th (21st Annual) meeting. The Pro- 
gram Committees are to be congratulated on the excellent programs which 
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have been arranged under the immediate direction of David 8. Stoller, Jack 
Silber, Herbert A. Meyer, and W. H. Kruskal with the overall guidance of our 
Program Coordinator, M. B. Wilk. The Assistant Secretaries, John F. Hof- 
mann, Herbert T. David, Marvin Kastenbaum, and John W. Pratt, are to be 
congratulated on the physical arrangements, and the Associate Secretaries, 
Evelyn Fix, Jack Silber, and Dorothy M. Gilford, on their performance of the 
duties of the Secretary with respect to meetings. 


OO —— 


MINUTES OF THE ANNUAL BUSINESS MEETING AUGUST 27, 1958 


The annual business meeting of the Institute of Mathematical Statistics was 
called to order at 5:30 p.m. August 27, 1958, in Kresge Auditorium in Cam- 
bridge, Massachusetts by George KE. Nicholson, Jr., Secretary, in the absence 
of the President and President-Elect. Approximately 45 members were present. 

Minutes of the September 11, 1957 business meeting held in Atlantic City 
were approved. 

teports of the Secretary, Treasurer, Editor, Program Coordinator and Presi- 
dent were presented and accepted. 

Ballots were distributed to those members who had not voted by mail. 

A resolution expressing appreciation of the work of the local arrangement 
committee and the staff of M.1I.T. for arrangements for the meeting was read 
and passed. _ 

G. EK. Nicholson, Jr. announced that unless there were objections, W. H. 
Kruskal as an ex officio member of the Council would be ruled ineligible for 
election and that the four highest candidates after eliminating Kruskal would 
be ruled elected. It was moved, seconded and passed that the ruling be ac- 
cepted. 

The results of the election are as follows: J. W. Tukey, President-Elect; 
William Kruskal, Editor; T. If. Harris, Council Member, 1959-61; 8S. 8. Wilks, 
Council Member, 1959-61; F. J. Anscombe, Council Member, 1959-61; Leo 
Katz, Council Member, 1959-61. 

The meeting was adjourned about 6:30 P.M. 


— 


REPORT OF THE EDITOR FOR 1958 


During the year ending July 31, the Annals received a larger number of new 
manuscripts than in any previous year (the total number of pages was slightly 
smaller than in the preceding year). The size of the printed volume for 1958 
will be about 1300 pages, and this has been adequate to maintain the backlog of 
accepted, unprinted papers at substantially less than one issue. The Council 
has taken notice of the financial problems posed by a larger Annals, and plans 
ure now being discussed for raising the required funds. 
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I take this opportunity to thank the many people whose hard work has been 
so valuable to the Annals during my editorial term, now ending. A list of persons 
who refereed papers during 1958 will be printed in an early issue. I thank par- 
ticularly Ann Greene, Jeanette Hiebert, Dorothy Stewart, Helena Williams, 
and Margaret Wray, who have carried on the work of the editorial office, and 
I want to express my appreciation to The RAND Corporation for making pos- 
sible my work on the Annals. 

T. KE. Harris 
editor 


—— 


PUBLICATIONS RECEIVED 
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